Schema for token files in the KanripoX project

2020-12-07

Table of contents

1 Overview

The token files described here serve as a shadow of other digital files that more thoroughly describe the texts documented there. This relieves the token files from the burden to describe the physical appearence, structure and transmission of the text. This information is available at any time by following the links back to these other files. The purpose of the token files is to provide a minimal description, containing only the characters of the text in a form that allows easy comparison and alignment of multiple versions. The function is similar to a concordance in that it provides access to the whole text, but without much of what a reader would expect to make reading (or editing) convenient, or even feasible. On the other hand, enough information should be retained to reconstruct a very basic version of the text.

The main elements under the root element tlistare::

The tg element holds the t elements, which have the character content of the text, one token per t. The purpose of the tg element is to group related t elements. tg can nest, and provide thus for a rudimentary structure in the token files.

2 The elements defined for the Token files

Schema KRXToken: Elements

<lb>

<lb> This element marks the beginning of a new line or line-like section on the text-bearing surface.
Modulederived-module-KRXToken
Attributes
edIdentifier of the edition to which this line belongs
Status Optional
Datatype string
nNumber or other label used to refer to this line
Status Optional
Datatype string
xml:id
Status Recommended
Datatype ID
Contained by
KRXToken: tg
May containEmpty element
Content model
<content>
 <empty/>
</content>
Schema Declaration
element lb
{
   attribute ed { string }?,
   attribute n { string }?,
   attribute xml:id { ID }?,
   empty
}

<pb>

<pb> This element marks the beginning of a new page or page-like section on the text-bearing surface.
Modulederived-module-KRXToken
Attributes
edIdentifier of the edition to which this page belongs
Status Optional
Datatype string
nNumber or other label used to refer to this page
Status Optional
Datatype string
xml:id
Status Recommended
Datatype ID
Contained by
KRXToken: tg
May containEmpty element
Content model
<content>
 <empty/>
</content>
Schema Declaration
element pb
{
   attribute ed { string }?,
   attribute n { string }?,
   attribute xml:id { ID }?,
   empty
}

<t>

<t> A token.
ModuleKRXToken
Attributes
roleToken type
Status Required
Legal values are:
h
Token is part of a heading
p
Token is part of a paragraph
s
Token is part of a seg element
n
Token is part of a note or annotation of any kind
q
Token is part of a quotation
v
Token is part of a verse line
o
Token is part of a textual feature not in this list.
posThe sequencial number of this token within this element (or token type).
Status Optional
Datatype nonNegativeInteger
tpThe sequencial number of this token within the whole text.
Status Required
Datatype nonNegativeInteger
fPunctuation or other non-token text items, immediately following the token.
Status Optional
Datatype string
pPunctuation or other non-token text items, immediately preceding the token.
Status Optional
Datatype string
nLabel or identifier of the element in the text of which this token is part.
Status Required
Datatype string
Contained by
KRXToken: tg
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element t
{
   attribute role { "h" | "p" | "s" | "n" | "q" | "v" | "o" },
   attribute pos { xsd:nonNegativeInteger }?,
   attribute tp { xsd:nonNegativeInteger },
   attribute f { string }?,
   attribute p { string }?,
   attribute n { string },
   text
}

<tg>

<tg> A group of tokens.
ModuleKRXToken
Attributes
xml:idThe identifier of this token group.
Status Optional
Datatype ID
nA label
Status Optional
Datatype string
roleToken group type
Status Optional
Legal values are:
h
Token group is a heading
p
Token group is (part of) a paragraph
s
Token group is a seg element
n
Token group is (part of) a note or annotation of any kind
q
Token group is (part of) a quotation
v
Token group is (part of) a verse line
o
Token group is (part of) a textual feature not in this list.
Contained by
KRXToken: tg tlist
May contain
KRXToken: t tg
derived-module-KRXToken: lb pb
Content model
<content>
 <alternate maxOccurs="unbounded"
  minOccurs="0">

  <elementRef key="tg"
   maxOccurs="unboundedminOccurs="0"/>

  <elementRef key="tmaxOccurs="unbounded"
   minOccurs="0"/>

  <elementRef key="pb"
   maxOccurs="unboundedminOccurs="0"/>

  <elementRef key="lb"
   maxOccurs="unboundedminOccurs="0"/>

 </alternate>
</content>
Schema Declaration
element tg
{
   attribute xml:id { xsd:ID }?,
   attribute n { string }?,
   attribute role { "h" | "p" | "s" | "n" | "q" | "v" | "o" }?,
   ( tx_tg* | tx_t* | tx_pb* | tx_lb* )*
}

<tlist>

<tlist> Root for token that may contain one or more tg elements.
ModuleKRXToken
Attributes
xml:id
Status Recommended
Datatype ID
edReference to the edition defined in the manifest.
Status Required
Datatype string
nA label
Status Optional
Datatype string
fileseqIf the tokens are in several files, this gives the sequential number of the file.
Status Optional
Datatype nonNegativeInteger
Contained by
May contain
KRXToken: tg
Content model
<content>
 <elementRef key="tgmaxOccurs="unbounded"/>
</content>
Schema Declaration
element tlist
{
   attribute xml:id { ID }?,
   attribute ed { string },
   attribute n { string }?,
   attribute fileseq { xsd:nonNegativeInteger }?,
   tx_tg+
}
Date: 2020-12-07