On this page:
1.1 XML
1.1.1 Further Reading
1.2 TEI

1 Background: XML and TEI🔗

These guidelines assume familiarity with basic concepts of XML and the TEI system. This section serves as a brief review of some key points and refers readers in need of more extensive background information to external resources.

1.1 XML🔗

XML (“eXtensible Markup Language”) is a system for (among other things) adding structured, machine-readable metadata to text-based documents. It is maintained as an international standard by the World Wide Web Consortium (W3C).

Metadata is data about data. For example, in our context, the sentence “The text used here is B. Jowetts translation of The Dialogues of Plato, Vol. I, Random House, New York.” is a piece of data. The information that the sentence was an end-note to page 135 of Ricœur’s “The Function Of Fiction In Shaping Reality” and that it was added by the translator is metadata.

Describing XML as a “markup language” is a statement about its concrete syntax: the way in which it is written down. In general terms, an XML document begins with a human-readable plain-text document, then “marks up” the structure of the document and additional metadata using tags, special bits of syntax enclosed in angle brackets (< and >).

This is a simplification: comments, declarations, and processing instructions in XML also use angle brackets, but are not considered tags in this sense.

Consider the following example:

<bibl>Paul Ricoeur, "The Function Of Fiction In Shaping Reality", in Man and World 12:2 (<date subtype="thisIsOriginal" type="publication" when="1979">1979</date>), 123-141</bibl>

In this example, the portions typeset like this are textual data, and the portions typeset like this are XML syntax. Considering only the XML syntax, we see an opening bibl tag, an opening date tag, a closing date tag, and a closing bibl tag. Notice in particular that every opening tag has a corresponding closing tag and that the most recently opened tag must always be closed before an outer tag can be closed.

XML provides a shorthand for writing tags that are immediately closed: for example, writing <pb n="0" /> is equivalent to <pb n="0"></pb>.

While the concrete syntax of an XML document looks like a sequence of characters, much of the power of XML derives from the fact that an XML document actually specifies a tree data structure of nested elements. An element is an abstract, logical entity which may contain textual data and/or other elements.

In the example above, the whole example is a bibl element, which contains both textual data (a human-readable citation) and a date element, which marks part of the citation as specifying a publication date.

Readers will notice the close relationship between elements, the abstract, logical entities, and tags, the notations in XML’s concrete syntax that mark them. In practice, “element” and “tag” are often used synonymously.

In addition to its contents, an element may have attributes, which provide additional machine-readable metadata about the element. Each attribute has a name and, when present, is assigned a value. In our example, the date element has an attribute named when with a value of "1979". This attribute encodes the date specified by the element in a standard, machine-readable format.

We also rely on a very minimal understanding of the XML concept of entities. In XML’s concrete syntax, the characters & and < have special meaning, and therefore are not allowed in textual data. They must be replaced with the corresponding XML entities &amp; and &lt;, respectively. (For Digital Ricœur, this is done automatically by “TEI Lint” or, if prepairing a document manually, one of our command-line tools: see Getting Started for more details.) No attempt is made here to explain the other, more advanced uses of entities in XML.

XML is specifically an “extensible” markup language because, beyond the common concrete syntax of tags and its interpretation as elements, attributes, and entities, it makes little attempt to specify the structure or meaning of an XML document. Those aspects are left to specific applications of XML, which can vary from recipies to entries in library catalogues. They will typically be codified in a Document Type Definition (DTD), which is a formal, machine-checkable specification for the structure of an XML document. Many projects in the humanities (including ours) use Document Type Definitions based on the TEI model, which is described below.

1.1.1 Further Reading🔗

Many systematic introductions to XML for beginners are available freely online, such as the XML Tutorial from the website “W3 Schools.” In fact, many of these tutorials cover far more detail about XML than is necessary to contribute to this project.

The W3C publishes a page called XML Essentials.

1.2 TEI🔗

As discussed above, the XML standard itself does not specify what elements exist, the semantic meanings of particular elements, or how the hierarchy of elements and textual data should be structured in a document. The Text Encoding Initiative consortium (TEI) publishes a standard (also referred to as TEI) based on XML suitable for many projects in the humanities. This standard is described at https://www.tei-c.org.

The TEI standard is what tells us, for example, that the p element means “this is a paragraph,” as well as specifying the structure for the catalog information in the teiHeader element.

However, because the TEI standard aims to define elements to meet the needs of many diverse projects (from original poetry to facsimiles of manuscripts), projects must define smaller, more targeted Document Type Definitions that address their precise use-cases. The TEI consortium provides a variety of tools to define such customizations with relative ease.

Digital Ricœur’s specific customization of the TEI standard is known as DR-TEI.dtd. It comes with documentation automatically generaterd by the TEI consortium’s tools, which is available at DR-TEI_doc.html. We also impose additional requirements on our TEI documents that are not easily specified using a custom DTD: these requirements are specified in this manual and are checked by the tools described under Tools.