TEI Lite


TEI Lite

  • a subset of the main TEI DTD -- with extensions
  • small, simple
  • realistic for existing texts (OTA, Virginia)
  • realistic for document production (TEI technical documents)
  • A good introduction to TEI Lite is available at http://www.tei-c.org/Lite/DTD/

The Structure(s) of a TEI text

  • a text contains a header followed by a text
  • the header contains:
    • file description containing bibliographic information about the machine-readable text itself, and its source
    • encoding description explaining how the electronic text was encoded
    • profile description containing further information about the text
    • revision description containing version control information about the text
  • the text may be unitary or composite
  • a text contains:
    • front matter
    • back matter
    • a body
  • in a composite text, the body is a group of texts (or nested groups)

TEI Structures Summarized

 
tei2 :: teiHeader text 
text :: front? (body|group) back? 
group     :: (text|group)+ 
teiCorpus:: teiHeader tei2+ 

Text divisions

  • generic, hierarchic subdivisions
  • `vanilla' or numbered
  • type attribute
  • associated <head> and <trailer> elements

Global Attributes

  • id for unique identification
  • n for (non-unique) name or number
  • rend for rendition
  • lang for language and hence writing-system

Applicable to all elements in TEI scheme.

Text components

What are divisions composed of?

  • prose is mostly paragraphs (<p>)
  • verse is mostly lines (<l>),sometimes in hierarchic groups (<lg>)
  • drama is mostly speeches (<sp>) containing <p> or <l> and interspersed with stage directions (<stage>)

These may be mixed, and may also appear directly within undivided texts.

Phrase level elements include...

  • typographically highlighted phrases (emphasis, technical terms, foreign language matter, titles, quoted matter, linguistically distinct etc.)
  • data-like (names, numbers, dates, times, addresses)
  • notes and cross references
  • editorial intervention (corrections, regularizations, additions, omissions etc.)

Boundary Points

Texts are not always neatly hierarchic:

  • page and line breaks (<pb>, <lb>, and <cb>)
  • requires left-to-right processing, may not fit well into hierarchical model of XML and XML software

Notes and Cross References

  • Notes of any kind: use <note>
  • in-line or out of line: (use place value to specify)
     
    <lg> 
      <l>The self-same moment I could pray</l>
      <l>And from my neck so free </l>
      <l>The albatross fell off, and sank</l> 
      <l>Like lead into the sea. <note type="auth"place="margin"> The
    	spell begins to break.</note> </l>
    </lg>
    
  • cross references : <ptr> and <ref>
     
    See especially <ref target="SEC12"> section 12 on page
    34</ref>. See especially <ptr target="SEC12"/>.
    
  • target is most conveniently an identifier (id value)
     
    ... see especially <ptr target="SEC12"/>. ... 
    <div1 id="SEC12"><head>Concerning Identifiers... ... 	
    
  • Together, these provide simple hypertext capability.

Extended Pointers

  • cross references outside the document: <xptr> and <xref>
  • target may be an identifier (id value):
     ...  see especially <xptr doc="doc2" from="ID (SEC12)"/>.
    
  • or target may involve other means of locating text in the document:
     
    ...  see especially <xptr doc="doc2" from="DESCENDANT (2 DIV1) 
    (4 P) CHILD (1 QUOTE LANG LAT)"/>.
    
  • Together, these provide simple hypertext capability.

(Not in TEI(Lite), but...) XML Pointers

The development of XML Pointer languages, including XPath, XPointer and XLink have been strongly influenced by the TEI Extended Pointers.

Currently, XPath is a W3C Recommendation as part of the XSL Transformations. XPointer and XLink are in the last stages of the standardization process.

These linking techniques are much more general and powerful than the TEI Extended Pointers. Users of the XML version of TEI are encouraged to consider the XML Pointers instead of native TEI pointers.

Lists

  • for lists of any kind (use type attribute to distinguish)
  • use <label> for two-column lists or as alternative to n attribute
  • may be nested as necessary

Bibliographic References

Use simple <bibl> with subcomponents:

  • <respStmt> (for any kind of responsibility)
  • or <author>, <editor>, etc.
  • <title> with optional level attribute
  • <imprint> groups publication details
  • <biblScope> adds page references etc.

The full Guidelines have the more detailed structured elements <biblStruct> and <biblFull>

Use <listBibl> for list of references

Character Encoding Recommendations

  • for interchange of XML documents should use ISO 10646 (=Unicode)
  • extend this where necessary, using the TEI Writing System Declaration

The WSD contains grapheme, entity name (private or public), AFII value, ISO 10646 value, prose description, bitmap.

The WSD mechanism is currently under revision and will likely be changed in the next version. A progress report on these developments will be given on March 18 here at Academia Sinica at the 「漢字智慧編碼與應用研討會」

The TEI Header

  • mandatory
  • independently interchangeable
  • support for librarians
  • support for corpus builders

The File Description contains...

  • ISBD (International Standard Book Description) areas:
    • titleStmt
    • editionStmt
    • extent
    • publicationStmt
  • sourceDesc
  • notesStmt

The Encoding Description contains...

  • projectDesc
  • samplingDecl
  • editorialDecl
  • tagsDecl
  • refsDecl
  • classDecl

The Profile Description contains...

miscellaneous additional information such as

  • <creation>
  • <langUsage>
  • <textClass>

plus, when the corpus tagset is chosen,

  • textDesc
  • particDesc
  • settingDesc

6 Next | First| Previous Introduction to XML, Markup and the TEI Guidelines