Document Type Definitions (DTD)
There are two ways to define the format of XML files. This lecture explains how to use DTD.
DTDs contain information about the structure of an XML document:
- Elements that may occur in the document.
- Sub-elements that an element may have.
- Attributes that an element may have
Structure
The are included at the beginning of the XML document between:
<?xml ... standalone="no">
and the root element.
They are included between the following tags:
<!DOCTYPE lecturers [
...
]>
where lecturers
is the name of the root tag.
Elements
The elements of a DTD has the following syntax:
<!--There must be 1 or more lecturer below the root element-->
<!ELEMENT lecturers (lecturer+)>
<!--Name is required, phone and email are optional, zero or more occurrences of teaches-->
<!ELEMENT lecturer (name, phone?, email?, teaches*)>
<!ELEMENT teaches (code, title)>
<!--Includes special symbol for text data-->
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT code (#PCDATA)>
<!ELEMENT title (#PCDATA)>
To parse this we identify the rules for element that can occur in the XML document (inside ()
:
- Names with no qualifying punctuation must occur exactly once.
- Options for repetitions are:
*
- Zero or more.+
- One or more.?
- Either zero or one.
Defining Attributes
To define the following attribute:
<module code=”COMP105” title=“Programming Paradigms” />
We would first define the module (without the elements):
<!ELEMENT module EMPTY>
Then define the attributes like follows:
<!ATTLIST module code CDATA #IMPLIED>
<!ATTLIST module title CDATA #IMPLIED>
The value after the #
can be some of the following:
#IMPLIED
- Optional#REQUIRED
- Required- Some value
COMPXXX
defining a default string. #FIXED
and some value - Constant
Attribute Datatypes
You can use the following datatypes:
CDATA
- Character data, containing any text.ID
- Used to identify individual elements in a document.ID
is an element name.
IDREF
/IDREFS
- Must correspond to the value ofID
attributes for some element in the document.- You can’t define what type you are pointing to, only that it is some ID for some object.
-
You can also have a list of names in the following form:
(en1|en2|...)
Element Identity, ID
s & IDREF
s
ID
allows a unique key to be associated with an element.IDREF
allows an element to refer to another element with the designated key and attribute type.IDREFS
allows an element to refer to multiple elements.
To loosely mode the relationships:
Department
hasLecturers
Department
has ahead
who is aLecturer
we would write:
<!ATTLIST lecturer staffNo ID #REQUIRED>
<!ATTLIST department staff IDREFS #IMPLIED>
<!ATTLIST department head IDREF #IMPLIED>
Document Validity
There are two levels of document processing:
- Well-Formed
- All elements must be within one root element.
- Elements must be nested in a tree structure without any overlap.
This ensures that the XML document conforms to the structural and notational rules of XML.
- Valid
- Document is well-formed.
- Document conforms it it’s DTD or XML schema