|
TEI Lite
- a subset of the main TEI DTD -- with extensions
- small, simple
- realistic for existing texts (OTA, Virginia)
- realistic for document production (TEI technical
documents)
- A good introduction to TEI Lite is available at http://www.tei-c.org/Lite/DTD/
The Structure(s) of a TEI text
- a text contains a header followed by a
text
- the header contains:
- file description containing bibliographic
information about the machine-readable text itself, and its
source
- encoding description explaining how the
electronic text was encoded
- profile description containing further
information about the text
- revision description containing version
control information about the text
- the text may be unitary or
composite
- a text contains:
- front matter
- back matter
- a body
- in a composite text, the body is a group of
texts (or nested groups)
TEI Structures Summarized
tei2 :: teiHeader text
text :: front? (body|group) back?
group :: (text|group)+
teiCorpus:: teiHeader tei2+
Text divisions
- generic, hierarchic subdivisions
- `vanilla' or numbered
- type attribute
- associated <head> and <trailer>
elements
Global Attributes
- id for unique identification
- n for (non-unique) name or number
- rend for rendition
- lang for language and hence
writing-system
Applicable to all elements in TEI
scheme.
Text
components
What are divisions composed of?
- prose is mostly paragraphs
(<p>)
- verse is mostly lines
(<l>),sometimes in hierarchic groups
(<lg>)
- drama is mostly speeches
(<sp>) containing <p> or <l> and
interspersed with stage directions
(<stage>)
These may be mixed, and may also appear directly within
undivided texts.
Phrase level elements include...
- typographically highlighted phrases
(emphasis, technical terms, foreign language matter, titles,
quoted matter, linguistically distinct etc.)
- data-like (names, numbers, dates, times,
addresses)
- notes and cross references
- editorial intervention (corrections,
regularizations, additions, omissions etc.)
Boundary Points
Texts are not always neatly hierarchic:
- page and line breaks (<pb>, <lb>, and
<cb>)
- requires left-to-right processing, may not fit well into
hierarchical model of XML and XML software
Notes and Cross References
- Notes of any kind: use <note>
- in-line or out of line: (use place value to
specify)
<lg>
<l>The self-same moment I could pray</l>
<l>And from my neck so free </l>
<l>The albatross fell off, and sank</l>
<l>Like lead into the sea. <note type="auth"place="margin"> The
spell begins to break.</note> </l>
</lg>
- cross references : <ptr> and <ref>
See especially <ref target="SEC12"> section 12 on page
34</ref>. See especially <ptr target="SEC12"/>.
- target is most conveniently an identifier
(id value)
... see especially <ptr target="SEC12"/>. ...
<div1 id="SEC12"><head>Concerning Identifiers... ...
- Together, these provide simple hypertext
capability.
(Not in TEI(Lite), but...) XML Pointers
The development of XML Pointer languages, including XPath,
XPointer and XLink have been strongly influenced by the TEI
Extended Pointers.
Currently, XPath is a W3C Recommendation as part of the XSL
Transformations. XPointer and XLink are in the last stages of
the standardization process.
These linking techniques are much more general and powerful
than the TEI Extended Pointers. Users of the XML version of TEI
are encouraged to consider the XML Pointers instead of native
TEI pointers.
Lists
- for lists of any kind (use type attribute
to distinguish)
- use <label> for two-column lists or as alternative
to n attribute
- may be nested as necessary
Bibliographic References
Use simple <bibl> with subcomponents:
- <respStmt> (for any kind of responsibility)
- or <author>, <editor>, etc.
- <title> with optional level
attribute
- <imprint> groups publication details
- <biblScope> adds page references etc.
The full Guidelines have the more detailed structured elements
<biblStruct> and <biblFull>
Use <listBibl> for list of references
Character Encoding Recommendations
- for interchange of XML documents should use ISO 10646
(=Unicode)
- extend this where necessary, using the TEI Writing
System Declaration
The WSD contains grapheme, entity name (private or public),
AFII value, ISO 10646 value, prose description, bitmap.
The WSD mechanism is currently under revision and will likely
be changed in the next version. A progress report on these
developments will be given on March 18 here at Academia Sinica
at the 「漢字智慧編碼與應用研討會」
The TEI Header
- mandatory
- independently interchangeable
- support for librarians
- support for corpus builders
The File Description contains...
- ISBD (International Standard Book
Description) areas:
- titleStmt
- editionStmt
- extent
- publicationStmt
- sourceDesc
- notesStmt
The Encoding Description
contains...
- projectDesc
- samplingDecl
- editorialDecl
- tagsDecl
- refsDecl
- classDecl
The Profile Description contains...
miscellaneous additional information such as
- <creation>
- <langUsage>
- <textClass>
plus, when the corpus tagset is chosen,
- textDesc
- particDesc
- settingDesc
|