Key-Value Stores
This is the simplest type of database with only two columns, one for the key and one for the value.
They have a simple access mechanism:
find(k)
- Returns the local value for key $k$.write(k, v)
- Inserts value $v$ under key $k$.
Distributed Storage
Each key-value pair $(k,v)$ is stored at some node:
- Assign values $v$ for $k$ to integers between 0 and $2^n-1$ where $n$ is large enough to hold all nodes and duplicates:
-
This uses a hash function:
\[h:\text{keys}\rightarrow\{0,\ldots,2^n-1\}\]
-
- Distribute node to some of the integers (typically randomly). This creates a ring of nodes.
- If $(k,v)$ is assigned to integer $i$, it is stored at the node following $i$ on the ring.
Scalability via Horizontal Fragmentation
New nodes can be added to the ring easily:
- Add nodes to free range(s) and more key-value pairs appropriately.
- Automatic horizontal fragmentation.
Replication
Replication is used to ensure availability.
Replicas (copies of the key-value pairs) are stored on consecutive nodes on the ring in clockwise order.
Eventual Consistency
DynomoDB and Voldemort allow multiple versions of a data item to be present at the same time (versioning):
- If a newer version of a data item is not yet available at a node, the older version is used.
- This is fine for many applications, like a shopping cart.
If this isn’t good enough we can assign a vector clock to each version of an item $X$:
- A list (vector) of pairs
(node, timestamp)
:node
- The node that has written $X$.timestamp
- The local time on the node the iem $X$ was written.
- Use the clock to decide if version $V_1$ originated from version $V_2$:
-
$V_1$ originated from $V_2$ if, for all nodes in $V_2$s clock, the corresponding timestamp is $\leq \text{timestamp in }V_1$.
Check that all the numbers in the vector clock are bigger, or equal to, the previous clock.
-
If it is incomparable (multiple values change), return all possibilities.
This is if some numbers are bigger and smaller between versions.
Incomparability between versions are resolved at read-time.
-
Property Implementation
- Scalability:
- Adding new nodes to increase capacity is easy.
- Automatic horizontal fragmentation.
- Availability & Fault-Tolerance:
- Due to replication.
- Can retrieve value for a key, even if a few nodes storing values for that key fail.
- High Performance:
- Retrieving the value for a key is fast:
- Apply the hash function to determine the node, then ask the node.
- The same is true for writing.
- Retrieving the value for a key is fast: