How we built an asynchronous, RESTful API for a log-structured, temporal, versioned NoSQL document store — Sirix.io (open source and free)
Why retaining old versions/revisions of your data becomes feasible nowadays
Life is subdued to constant evolution. So is our data, be it in research, business or personal information management. As such it’s surprising that databases usually just keep the current state. With the advent, however of flash drives as for instance SSDs, which are much faster in randomly accessing data in stark contrast to spinning disks and not very good at erasing or overriding data, we are now capable of developing clever versioning algorithms and storage systems to keep past states while not impeding efficiency/performance. Search/insertion/deletion-operations should therefore be in logarithmic time (O(log(n))), to compete with commonly used index structures.
The evolutionary, temporal NoSQL store Sirix.io
Sirix is a versioned, temporal storage system, which is log-structured at its very core.
We support N read-only transactions, which are bound to a single revision (each transaction might be started on any past revision) concurrently to one write transaction on a single resource. Our system thus is based on snapshot isolation. The write-transaction can revert the most recent revision to any past revision. Changes to this past revision can then be commit to create a new snapshot and therefore a new revision.
Writes are batched and synced to disk in a post-order traversal of the internal index tree-structure, during a transaction commit. Thus, we convert random write I/O to sequential writes. Furthermore we are able to store hashes of the pages in parent-pointers just like ZFS for future integrity checks (Merkle-Tree). The Uber-Page, which is the root page of the resource is written last. Thus, our storage is always consistent without the need of write-ahead-logging (WAL). Instead of having a persistent log and a data store we only have the log-structured index (and a lightweight buffer manager).
Snapshots, that is new revisions are created during every commit. Apart from a numerical revision-ID, the timestamp is serialized. A revision can afterwards be opened either by specifying the ID or the timestamp. Using a timestamp involves a binary-search on an array of timestamps, stored persistently in a second file and loaded in memory on startup. The search ends if either the exact timestamp is found or the closest revision to the given point in time. Data is never written back to the same place and thus not modified in-place. Instead, Sirix uses copy-on-write (COW) semantics at the record-level (creates page-fragments and usually doesn’t copy whole pages). Every time a page has to be modified records, which have changed as well as some of the unchanged records are written to a new location. Which records exactly are copied depends on the versioning algorithm used. It’s thus especially well suited for flash-based drives as for instance SSDs. Changes to a resource within a database occur within the aforementioned resource-bound single write-transaction. Therefore, first a resource manager has to be opened on the specific resource to start single resource-wide transactions (kind of a resource-wide session). Note, that we already started work on database wide transactions 🙂
We recently wrote another article with much more background information on the principles behind Sirix.
Simple, transactional cursor based API
The following shows a simple Java code to create a database, a resource within the database and the import of a JSON-document. It will be shredded to our internal, binary representation.
Powerful XQuery API for JSON and XML
A powerful query API based on an XQuery processor (Brackit) is available for processing both natively stored XML as well as JSON and can be used in HTTP POST- as well as GET-requests (the latter for simple queries):
The query basically opens a database/resource in a specific revision based on a timestamp (2019–04–13T16:24:27Z) and searchs for all statuses, which have a created_at timestamp, which has to be greater than the 1st of February in 2018 characteristic and did not exist in the previous revision. => is a dereferencing operator used to dereference keys in JSON objects, array values can be accessed as shown with the function bit:array-values or through specifying an index, starting with zero: array[] for instance specifies the first value of the array. XQuery was originally designed for querying XML data similar to SQL for relational data (however, much more powerful), but it’s also the perfect candidate to query other tree-structured data as for instance JSON with minor adjustments.
We also support a native XML-storage with nearly identical APIs. In general every kind of data could be stored in Sirix as long as it can be fetched by a generated sequential, stable record-identifier, which is assigned by Sirix during insertion and as long as a custom serializer/deserializer is plugged in.
Vert.x, Kotlin/Coroutines and Keycloak
Vert.x is closely modeled after Node.js and for the JVM. Everything in Vert.x should be non blocking. As thus a single thread called an event-loop can handle a lot of requests. Blocking calls have to be handled on a special Thread Pool. The default are two event-loops per CPU (Multi-Reactor Pattern).
We are using Kotlin, because it’s simple and concise. One of the features, which is really interesting are coroutines. Conceptually they are like very lightweight threads. While creating threads is very expensive creating a coroutine is not. Coroutines allow writing of concurrent code almost like sequential. Whenever a coroutine is suspended due to blocking calls or long running tasks, the underlying thread is not blocked and can be reused. Under the hood each suspending function gets another parameter through the Kotlin compiler, a continuation, which stores where to resume the function (normal resuming, resuming with an exception).
Keycloak is used as the authorization server via OAuth2 (Password Credentials Flow), as we decided not to implement authorization ourselves.
Things to consider when building the Server
First, we have to decide, which OAuth2 flow best suits our needs. As we built a REST-API usually not consumed by user agents/browsers we decided to use the Pasword Credentials Flow. It is as simple as this: first get an access token, second send it with each request in the Authorization header.
In order to get the access-token, first a request has to be made against a POST /login — route with the username/password credentials sent in the body as a JSON-object.
The implementation looks like this:
The coroutine-handler is a simply extension function:
Coroutines are launched on the Vert.x event loop (the dispatcher).
In order to execute a longer running handler we use
Vert.x uses a different thread pool for these kind of tasks. The task is thus executed in another thread. Beware that the event loop isn’t going to be blocked, the coroutine is going to be suspended.
Every endpoint of the API is protected by an authorization handler, which communicates with the Keycloak instance and checks authorization:
Thus, our routes look like this:
Basically this is an endpoint for HTTP GET-Requests which produces “application/json” content and retrieves a resource in a database. Optionally a query parameter may be used to dive into the internal tree-structure of the stored JSON- or XML-resource. The endpoint is protected by the aforementioned authentication-handler, thus only authorized users are able to view the resource.
API design by example
Now we are switching the focus to our API again and show how it’s designed with examples. We first need to set up our server and Keycloak (read on https://sirix.io how to do this).
Once both servers are up and running, we’re able to write a simple HTTP-Client. We first have to obtain a token from the
/login endpoint with a given “username/password” JSON-Object. Using an asynchronous HTTP-Client (from Vert.x) in Kotlin, it looks like this:
This access token must then be sent in the Authorization HTTP-Header for each subsequent request. Storing a first resource looks like this (simple HTTP PUT-Request):
$server is defined as “protocol://host:port/” (for instance
https://localhost:9443/). First, an empty database with the name
database with some metadata is created, second the JSON-fragment is stored with the name
resource1in our internal, binary format. The
PUT HTTP-Request is idempotent. Another PUT-Request with the same URL endpoint will delete the former database and resource and recreate the database/resource. Note that we used the HTTP header “Content-Type” to specify which resource type it is, as we currently allow to create both XML- as well as JSON-databases as well as the “Accept-Header”, that is which resource-type we expect to retrieve if any.
The HTTP response-code should be 200 (everything went fine) in which case the HTTP-body yields the stored resource.
GET HTTP-Request to
https://localhost:9443/database/resource1 we are also able to retrieve the stored resource again.
However, this is not really interesting so far. We can update the resource via a
POST HTTP-Request. First, we need to get the internal, unique node-ID of the node, where we want to insert new JSON-data:
We can either use a POST- or a GET HTTP-Request with the query URL-encoded. The query-Paramater allows to specify simple queries directly in the URI:
However, as we can see the queries quickly become unreadable.
Remember, that we stored the following JSON-document in our database:
With the above query we for instance selected “foo”, then the third item in the array (2.33) and then retrieve its node key.
The resulting node key/node identifier in this example is 6. Assuming we retrieved the access token as before, we can simply do a POST-Request and use the information we gathered before about the node key:
The interesting part is the URL we are using as the endpoint. We simply say, select the node with the ID 6, then insert the given JSON-data as the right-sibling of the node, which is selected by its node key. This yields the following serialized JSON-document (note, that we compress the JSON string as much as possible, thus, the pretty printing here is just for the sake of readability):
Every PUT- as well as POST HTTP-request implicitly
commits the underlying transaction and thus creates a new revision. An idea to bundle a bunch of requests into a single transaction would be to expose a transaction as a resource, for instance to send its ID with every request and to commit the resource through a request to a specific “commit” resource / GET endpoint.
Remember, that the first version of the resource before our recent commit can easily be reconstructed. Thus, we are now able send the first GET-request for retrieving the contents of the whole resource again for instance through specifying a simple query, to select the root JSON object in all revisions (in our case only two revisions exist so far). Thus, execute a
GET https://localhost:9443/database/resource1?query=jn:all-times(.) request and get the following result:
Note, that we used a time-traveling function in the query-parameter. In general we support several additional time traveling functions: jn:future, jn:past, jn:previous, jn:next, jn:first, jn:last and jn:all-times as we have seen. We also support variants thereof whereas the current context item is also in the result set (notably past-or-self, future-or-self…).
Of course, the usual way would be, to navigate to the nodes you are interested in, add predicates, and then navigate in time, to watch how a node or the whole sub-tree changed. This is an incredible powerful feature and might be the subject of a future article. Note that the only notable exception currently are null-nodes, which can not be queried on the time-axis as our query compiler Brackit currently emits empty sequences for null-items. We might change this in future versions. Note, that in future versions we might also implement updating primitives for the JSON part in the query compiler, thus it will be possible to avoid first having to retrieve the node key and then sending another query. We probably will simply add XQuery functions as a first step.
The same can be achieved through specifying a range of revisions to serialize (start- and end-revision parameters) in the GET-request:
or via timestamps:
However, if we first open a resource, then via a query select individual nodes, it is faster to use the time traveling axis, otherwise the same query has to be executed for each opened revision (parsed, compiled, executed…).
We for sure are also able to delete the resource or any subtree thereof by an with a simple
This deletes the node with the node key/ID 7 — in our case as it is a record key node the whole sub-tree (the object key “bar” in the running example). For sure the change is committed as revision 3. If you omit the nodeId parameter in the URL the resource gets deleted. If you also omit the resource name the whole database with potentially a lot of resources is going to be deleted.
As such all old revisions still can be queried for the whole sub-tree as it was during the transaction-commit, making it an ideal candidate for audits, correcting human or application errors.
Note, that we also have sophisticated diffing capabilities, but lacking a proper serialization format for JSON right now. Regarding XML we currently emit XQuery Update Statements. Furthermore we want to re-enable a dated GUI based on (still ;-)) novel visualization approaches for comparing tree structures: https://www.youtube.com/watch?v=l9CXXBkl5vI&t=26s
Next, also pre-compiled user-defined XQuery Scripts should be storable on the application server.