Key-Value Stores

This is the simplest type of database with only two columns, one for the key and one for the value.

They have a simple access mechanism:

Distributed Storage

Each key-value pair $(k,v)$ is stored at some node:

Assign values $v$ for $k$ to integers between 0 and $2^n-1$ where $n$ is large enough to hold all nodes and duplicates:
- This uses a hash function:
  \[h:\text{keys}\rightarrow\{0,\ldots,2^n-1\}\]
Distribute node to some of the integers (typically randomly). This creates a ring of nodes.
If $(k,v)$ is assigned to integer $i$, it is stored at the node following $i$ on the ring.

New nodes can be added to the ring easily:

Replication is used to ensure availability.

Replicas (copies of the key-value pairs) are stored on consecutive nodes on the ring in clockwise order.

DynomoDB and Voldemort allow multiple versions of a data item to be present at the same time (versioning):

If a newer version of a data item is not yet available at a node, the older version is used.
- This is fine for many applications, like a shopping cart.

If this isn’t good enough we can assign a vector clock to each version of an item $X$:

A list (vector) of pairs (node, timestamp):
- node - The node that has written $X$.
- timestamp - The local time on the node the iem $X$ was written.
Use the clock to decide if version $V_1$ originated from version $V_2$:
- $V_1$ originated from $V_2$ if, for all nodes in $V_2$s clock, the corresponding timestamp is $\leq \text{timestamp in }V_1$.
  
  Check that all the numbers in the vector clock are bigger, or equal to, the previous clock.
- If it is incomparable (multiple values change), return all possibilities.
  
  This is if some numbers are bigger and smaller between versions.
  
  Incomparability between versions are resolved at read-time.

Scalability:
- Adding new nodes to increase capacity is easy.
- Automatic horizontal fragmentation.
Availability & Fault-Tolerance:
- Due to replication.
- Can retrieve value for a key, even if a few nodes storing values for that key fail.
High Performance:
- Retrieving the value for a key is fast:
  - Apply the hash function to determine the node, then ask the node.
- The same is true for writing.