Skip to content
UoL CS Notes

Document Type Definitions (DTD)

COMP207 Lectures

There are two ways to define the format of XML files. This lecture explains how to use DTD.

DTDs contain information about the structure of an XML document:

  • Elements that may occur in the document.
  • Sub-elements that an element may have.
  • Attributes that an element may have

Structure

The are included at the beginning of the XML document between:

<?xml ... standalone="no">

and the root element.

They are included between the following tags:

<!DOCTYPE lecturers [
...
]>

where lecturers is the name of the root tag.

Elements

The elements of a DTD has the following syntax:

<!--There must be 1 or more lecturer below the root element-->
<!ELEMENT lecturers (lecturer+)>
<!--Name is required, phone and email are optional, zero or more occurrences of teaches-->
<!ELEMENT lecturer (name, phone?, email?, teaches*)>
<!ELEMENT teaches (code, title)>
<!--Includes special symbol for text data-->
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT code (#PCDATA)>
<!ELEMENT title (#PCDATA)>

To parse this we identify the rules for element that can occur in the XML document (inside ():

  • Names with no qualifying punctuation must occur exactly once.
  • Options for repetitions are:
    • * - Zero or more.
    • + - One or more.
    • ? - Either zero or one.

Defining Attributes

To define the following attribute:

<module code=”COMP105” title=“Programming Paradigms” />

We would first define the module (without the elements):

<!ELEMENT module EMPTY>

Then define the attributes like follows:

<!ATTLIST module code CDATA #IMPLIED>
<!ATTLIST module title CDATA #IMPLIED>

The value after the # can be some of the following:

  • #IMPLIED - Optional
  • #REQUIRED - Required
  • Some value COMPXXX defining a default string.
  • #FIXED and some value - Constant

Attribute Datatypes

You can use the following datatypes:

  • CDATA - Character data, containing any text.
  • ID - Used to identify individual elements in a document.
    • ID is an element name.
  • IDREF/IDREFS - Must correspond to the value of ID attributes for some element in the document.
    • You can’t define what type you are pointing to, only that it is some ID for some object.
  • You can also have a list of names in the following form:

      (en1|en2|...)
    

Element Identity, IDs & IDREFs

  • ID allows a unique key to be associated with an element.
  • IDREF allows an element to refer to another element with the designated key and attribute type.
  • IDREFS allows an element to refer to multiple elements.

To loosely mode the relationships:

  • Department has Lecturers
  • Department has a head who is a Lecturer

we would write:

<!ATTLIST lecturer staffNo ID #REQUIRED>
<!ATTLIST department staff IDREFS #IMPLIED>
<!ATTLIST department head IDREF #IMPLIED>

Document Validity

There are two levels of document processing:

  • Well-Formed
    • All elements must be within one root element.
    • Elements must be nested in a tree structure without any overlap.

    This ensures that the XML document conforms to the structural and notational rules of XML.

  • Valid
    • Document is well-formed.
    • Document conforms it it’s DTD or XML schema