Skip to content
UoL CS Notes

Key-Value Stores

COMP207 Lectures

This is the simplest type of database with only two columns, one for the key and one for the value.

They have a simple access mechanism:

  • find(k) - Returns the local value for key $k$.
  • write(k, v) - Inserts value $v$ under key $k$.

Distributed Storage

Each key-value pair $(k,v)$ is stored at some node:

  1. Assign values $v$ for $k$ to integers between 0 and $2^n-1$ where $n$ is large enough to hold all nodes and duplicates:
    • This uses a hash function:

      \[h:\text{keys}\rightarrow\{0,\ldots,2^n-1\}\]
  2. Distribute node to some of the integers (typically randomly). This creates a ring of nodes.
  3. If $(k,v)$ is assigned to integer $i$, it is stored at the node following $i$ on the ring.

Scalability via Horizontal Fragmentation

New nodes can be added to the ring easily:

  • Add nodes to free range(s) and more key-value pairs appropriately.
  • Automatic horizontal fragmentation.

Replication

Replication is used to ensure availability.

Replicas (copies of the key-value pairs) are stored on consecutive nodes on the ring in clockwise order.

Eventual Consistency

DynomoDB and Voldemort allow multiple versions of a data item to be present at the same time (versioning):

  • If a newer version of a data item is not yet available at a node, the older version is used.
    • This is fine for many applications, like a shopping cart.

If this isn’t good enough we can assign a vector clock to each version of an item $X$:

  1. A list (vector) of pairs (node, timestamp):
    • node - The node that has written $X$.
    • timestamp - The local time on the node the iem $X$ was written.
  2. Use the clock to decide if version $V_1$ originated from version $V_2$:
    • $V_1$ originated from $V_2$ if, for all nodes in $V_2$s clock, the corresponding timestamp is $\leq \text{timestamp in }V_1$.

      Check that all the numbers in the vector clock are bigger, or equal to, the previous clock.

    • If it is incomparable (multiple values change), return all possibilities.

      This is if some numbers are bigger and smaller between versions.

      Incomparability between versions are resolved at read-time.

Property Implementation

  • Scalability:
    • Adding new nodes to increase capacity is easy.
    • Automatic horizontal fragmentation.
  • Availability & Fault-Tolerance:
    • Due to replication.
    • Can retrieve value for a key, even if a few nodes storing values for that key fail.
  • High Performance:
    • Retrieving the value for a key is fast:
      • Apply the hash function to determine the node, then ask the node.
    • The same is true for writing.