An Overview of Riak: an Open Source NoSQL Database

Riak is a document oriented open-source database system. It is shaped by the Amazon Dynamo document and provides a decentralized key-value store that supports standard Get, Put and Delete operations. Riak is a distributed, highly scalable and fault-tolerant store with map/reduce, HTTP, JSON and REST queries making it ideal for web apps. And, of course, Riak is NoSQL.

To understand why Riak is so powerful we need to know some theory. Let’s take a look at Amazon Dynamo. In Amazon Dynamo documentation we have 3 terms that describe distributed store behavior: N, R and W. N is the number of replicas of each value in store. R is the number of replicas needed for read operations. W is the number of replicas needed for write operations. Riak’s goal is to transmit N, R and W logic to an application. That’s why Riak can adapt application requirements.

The Riak ring is composed on identical nodes and data is automatically replicated to multiple nodes. So, any node can be added or removed easily without manual migration of data. Hence, since every node within the cluster is identical, there is no single point of failure or bottleneck. Riak clusters can grow into the hundreds or thousands of nodes, and naturally, there are sometimes machine failures. Riak detects machine failures and recovers when a machine is brought back into the cluster. The data (existing and new) is shared among the nodes automatically. If some nodes are unavailable, you can still read and write if the live node count is acceptable. If cluster parts that share the new version of a document become disconnected from each other, Riak returns both values and the application can choose what to do. In this scenario you can also configure Riak to use freshest variant.

Here are some tidbits about Riak:

Links

Links are metadata that establish one-way relationships between objects in Riak. They can be used to loosely model graph like relationships between objects.

The Link Header

The way to read and modify links via the HTTPAPI is the HTTP Link header. This header emulates the purpose of <link> tags in HTML, that is, establishing relationships to other HTTP resources. The format that Riak uses is like so:

Link:

Inside the angle-brackets (<,>) is a relative URL to another object in Riak. The “tag” portion in double-quotes is any string identifier that has a meaning relevant to your application. Objects can have multiple links by separating them with commas. For example, if an object was a participant in a doubly-linked list of objects, it might look like this:

Link: </riak/list/1>; riaktag=”previous”, </riak/list/3>; riaktag=”next”

Link-walking

Link-walking (traversal) is a special case of MapReduce querying and can be accessed through HTTPLinkWalking. Link-walks start at a single input object and follow links on that object to find other objects that match the submitted specifications. More than one traversal may be specified in a single request, with any number of the intermediate results returned. The final traversal in a link-walking request always returns results.

How Riak Spreads Processing

Riak’s MapReduce has an additional goal: increasing data-locality. When processing a large dataset, it’s often much more efficient to take the computation to the data than it is to bring the data to the computation. In practice, your MapReduce job code is likely less than 10 kilobytes, so it is more efficient to send the code to the gigs of data being processed than to stream gigabytes of data to your 10k of code.

It is Riak’s solution to the data-locality problem that determines how Riak spreads the processing across the cluster. In the same way that any Riak node can coordinate a read or write by sending requests directly to the other nodes responsible for maintaining that data, any Riak node can also coordinate a MapReduce query by sending a map-step evaluation request directly to the node responsible for maintaining the input data. Map-step results are sent back to the coordinating node, where reduce-step processing can produce a unified result.

Put more simply: Riak runs map-step functions right on the node holding the input data for those functions, and it runs reduce-step functions on the node coordinating the MapReduce query.

How a Link Phase Works in Riak

Link phases find links matching patterns specified in the query definition. The patterns specify which buckets and tags links must have. “Following a link” means adding it to the output list of this phase. The output of this phase is often most useful as an input to a map phase, or another reduce phase.

When you should use Riak

When you can use Riak, but probably shouldn’t

When you cannot use Riak

© 2008 SYS-CON Media