Skip to content
UoL CS Notes

Semi-Structured Data

COMP207 Lectures

Semi-structured data lies between fully structured data (like relational databases) and unstructured data (like arbitrary data files).

  • Semi-structured data doesn’t have to fit to a schema.
  • Programs have to know how to read & interpret the data.
  • Semi-structured data is self describing and flexible.

Semi-Structured Data Model

Semi-structured data is typically a tree with the following properties:

  • Leaf Nodes - Have associated data.
  • Inner Nodes - Have edges going to other nodes:
    • Each edge has a label.
  • Root:
    • Has no incoming edges.
    • Each node is reachable from the root.
graph
root -->|lecturer| a[ ] & b[ ]
a -->|name| c[ ]
a -->|teaches| d[ ]
d -->|taughtBy| c[J. Fearnley]
d -->|code| e[COMP105]
d -->|title| f[Progr. Paradigms]
b --> ...

Applications

They are a popular form for storing and sharing data on the web.

Also are used for the storage of documents such as:

  • Word processor documents.
  • Spreadsheets.
  • Vector Graphics.

Implementations

  • XML (eXtensible Markup Language)
    • Tree structured data.
  • JSON (JavaScript Object Notation)
    • Similar to XML
  • Simple Key-Value Relationships:
    • Correspond to very simple XML/JSON documents.
  • Graphs
    • A general form of semi-structured data.