Schema for Manifest files in the KanripoX project

2020-12-07

Table of contents

1 Overview

The Manifest.xml described here contains information about a set of editions that are grouped here together, usually for the purpose of further description and processing.

There are two main elements under the root element manifest1:

The editions element holds information about the editions that are collected here. It contains edition elements, which give the details for each edition. This includes also the type, which can be either "documentary" or "interpretative". Documentary editions are editions that strive to reproduce an existing print edition, while interpretative editions do reflect the views of the editor and do not follow one single edition.

Other details for editions that will be collected here are the id, which is a unique label (or identifier) used to refer to this specific edition within the manifest and the processing systems.

The edition element may have the following children:

Both of these elements are optional. description contains a description of the edition, which could be the title, but also other information deemed relevant. divisions allows reference to divisions within the edition. This element is repeatable and when occuring more than once the edition is considered made up of the sequence of these divisions.

The divisions element can also occur as a child element of manifest, optionally following the editions element. If used here, there will be only one division element, which holds all subdivisions as possibly nesting div elements. The purpose of this element is to provide an entry point to the editions, which is neither tied to one specific edition, nor to a hyperlink or similar in a technical sense. The label on the div elements is used to provide a human readable label that can be used to point to that specific division, much the same as "Chapter 2" will (usually) refer to the same section of a work, no matter which edition is used. To serve as a link between this nesting structure of chapters, sections and so forth, each div can have one or more edRef elements, which point to the text span in one of the editions that is covering this specific section.
<div label="第一章">
 <edRef end="61key="KR5c0057_tls"
  start="0"/>

 <edRef end="58key="CH1a0918a_chant"
  start="0"/>

 <edRef end="402key="CH1a0918b_chant"
  start="2"/>

</div>
In this example, the start and end attributes give the number of the first and last token that is part of this section of a text, thus identifying the text span independent of the text format of the text. Other possibilities for adressing a text span are available if the edition is in TEI/XML.

2 The elements defined for the Manifest

Schema KRXManifest: Elements

<creation>

<creation> Information about the creation
ModuleKRXManifest
Contained by
May contain
KRXManifest: date resp title
Content model
<content>
 <alternate maxOccurs="unbounded"
  minOccurs="0">

  <elementRef key="title"/>
  <elementRef key="date"/>
  <elementRef key="resp"/>
 </alternate>
</content>
Schema Declaration
element creation { ( krx_title | krx_date | krx_resp )* }

<date>

<date> Date of the work
ModuleKRXManifest
Attributes
notbeforeEarliest possible date
Status Optional
Datatype string
notafterLatest possible date
Status Optional
Datatype string
certDegree of certainty of this assertion
Status Optional
Legal values are:
high
High degree of certainty
middle
Middle degree of certainty
low
Low degree of certainty
Contained by
KRXManifest: creation
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element date
{
   attribute notbefore { string }?,
   attribute notafter { string }?,
   attribute cert { "high" | "middle" | "low" }?,
   text
}

<description>

<description> Description of the edition or item this element is attached to.
ModuleKRXManifest
Contained by
KRXManifest: div edition manifest
May contain
KRXManifest: creation note title
character data
Content model
<content>
 <alternate maxOccurs="unbounded"
  minOccurs="0">

  <textNode/>
  <elementRef key="noteminOccurs="0"/>
  <elementRef key="titleminOccurs="0"/>
  <elementRef key="creationminOccurs="0"/>
 </alternate>
</content>
Schema Declaration
element description { ( text | krx_note? | krx_title? | krx_creation? )* }

<div>

<div> One specific subdivision on any level.
ModuleKRXManifest
Attributes
labelA label to identify the subdivision, can be any string, but should be unique in the manifest. This can be used to access this textual division.
Status Optional
Datatype token
editionA reference to the edition, as defined elsewhere in this manifest.
Status Optional
Datatype IDREF
sequenceSequencial number of this division, given in such a way that ordering by this number will produce the text in the same sequence as the base edition.
Status Optional
Datatype nonNegativeInteger
startThe sequencial number of the first token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
endThe sequencial number of the last token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
dividIf the source file of this edition has an identifier (usually a xml:id for this subdivision), it can be recorded here.
Status Optional
Datatype token
Contained by
KRXManifest: div divisions
May contain
KRXManifest: description div edRef label
Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="label"
   maxOccurs="unboundedminOccurs="0"/>

  <elementRef key="description"
   minOccurs="0"/>

  <elementRef key="edRef"
   maxOccurs="unboundedminOccurs="0"/>

  <elementRef key="div"
   maxOccurs="unboundedminOccurs="0"/>

 </sequence>
</content>
Schema Declaration
element div
{
   attribute label { token }?,
   attribute edition { xsd:IDREF }?,
   attribute sequence { xsd:nonNegativeInteger }?,
   attribute start { xsd:nonNegativeInteger }?,
   attribute end { xsd:nonNegativeInteger }?,
   attribute divid { token }?,
   ( krx_label*, krx_description?, krx_edRef*, krx_div* )
}

<divisions>

<divisions> The internal subdivisions of the work under consideration.
ModuleKRXManifest
Attributes
editionIf necessary, the edition for which these textual divisions are valid can be given here
Status Optional
Datatype token
Contained by
KRXManifest: edition manifest
May contain
KRXManifest: div
Content model
<content>
 <elementRef key="div"
  maxOccurs="unboundedminOccurs="1"/>

</content>
Schema Declaration
element divisions { attribute edition { token }?, krx_div+ }

<edition>

<edition> One edition of the work. If there are multiple divisions, this indicates that the sequence of these divisions make up the work.
ModuleKRXManifest
Attributes
xml:idThe identifier of the work. This will be used to refer to this manifest from the display of this text.
Status Optional
Datatype ID
idThe identifier of the edition. This is required and has to be unique within this manifest. It will be used by the processing tools to refer to this edition.
Status Required
Datatype ID
formatThe parsing tool is selected based on the format given here, there are two formats defined at the moment. Additional formats can be added, but require a plugin to parse them.
Status Required
Legal values are:
xml/TEI
TEI file encoded in XML
txt/mandoku
Mandoku format
locationThis gives either the relative path to the local folder containing the edition or a resolvable remote reference to the edition, for example on github.
Status Required
Datatype string
Note

TODO: format for remote reference.

TODO: Format for identifying portion of text in file.

baseThe edition marked as 'base' is the reference edition for sequential reordering.
Status Optional
Legal values are:
true
This edition is the reference edition
false
Not the reference edition (default)
typeThe edition has to be declared as either ‘documentary’ or ‘interpretative’.
Status Required
Legal values are:
documentary
An edition that documents an existing print source as faithful as possible, without editorial changes.
interpretative
An edition that might be based on a print source, but possibly makes editorial changes.
roleOne of the editions has to be declared as the base edition, the others are reference editions.
Status Recommended
Legal values are:
base
This edition is the base edition.
reference
All editions except the base edition are considered reference editions. [Default]
languageThe language of the document, identified with an identifier according to RFC 1766.
Status Optional
Datatype language
sigleA short identifier used to identify this edition.
Status Optional
Datatype string
Contained by
KRXManifest: editionGroup editions
May contain
Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="creationmaxOccurs="1"
   minOccurs="0"/>

  <elementRef key="description"/>
  <elementRef key="tokenmapmaxOccurs="1"
   minOccurs="0"/>

  <elementRef key="divisions"
   maxOccurs="unboundedminOccurs="0"/>

 </sequence>
</content>
Schema Declaration
element edition
{
   attribute xml:id { xsd:ID }?,
   attribute id { xsd:ID },
   attribute format { "xml/TEI" | "txt/mandoku" },
   attribute location { string },
   attribute base { "true" | "false" }?,
   attribute type { "documentary" | "interpretative" },
   attribute role { "base" | "reference" }?,
   attribute language { xsd:language }?,
   attribute sigle { string }?,
   ( krx_creation?, krx_description, krx_tokenmap?, krx_divisions* )
}

<editionGroup>

<editionGroup> A group of the editions representing the work under consideration.
ModuleKRXManifest
Attributes
typeThe treatment of the editions within this group are based on the value of this attribute.
Status Required
Legal values are:
root
The root text of this work.
root+annotation
The root text, interspersed with commentary.
annotation
Commentary to the root text, without repeating the text.
translation
Translations of the text and / or commentary.
other
Texts, that are grouped with this texts for some reason other than being textually related.
sigleA short identifier used to identify this group of editions.
Status Optional
Datatype string
Contained by
KRXManifest: editions
May contain
KRXManifest: creation edition
Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="creationmaxOccurs="1"
   minOccurs="0"/>

  <elementRef key="edition"
   maxOccurs="unboundedminOccurs="1"/>

 </sequence>
</content>
Schema Declaration
element editionGroup
{
   attribute type
   {
      "root" | "root+annotation" | "annotation" | "translation" | "other"
   },
   attribute sigle { string }?,
   ( krx_creation?, krx_edition+ )
}

<editions>

<editions> The editions representing the work under consideration. Work is taken in a very broad sense here.
ModuleKRXManifest
Contained by
KRXManifest: manifest
May contain
KRXManifest: edition editionGroup
Content model
<content>
 <alternate maxOccurs="1minOccurs="1">
  <elementRef key="editionGroup"
   maxOccurs="unboundedminOccurs="1"/>

  <elementRef key="edition"
   maxOccurs="unboundedminOccurs="1"/>

 </alternate>
</content>
Schema Declaration
element editions { krx_editionGroup+ | krx_edition+ }

<edRef>

<edRef> Reference to this subdivision in one specific edition, identified by the key.
ModuleKRXManifest
Attributes
startThe sequencial number of the first token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
endThe sequencial number of the last token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
keyA reference to the edition, as defined elsewhere in this manifest.
Status Optional
Datatype IDREF
timestampThe timestamp in ISO format, e.g. 2020-10-09T14:23:52+09:00.
Status Optional
Datatype dateTime
labelA label to identify the subdivision as used in this edition. It can be any string, but should be unique in the manifest. This can be used to access this textual division.
Status Optional
Datatype token
Contained by
KRXManifest: div
May containEmpty element
Content model
<content>
 <empty/>
</content>
Schema Declaration
element edRef
{
   attribute start { xsd:nonNegativeInteger }?,
   attribute end { xsd:nonNegativeInteger }?,
   attribute key { xsd:IDREF }?,
   attribute timestamp { xsd:dateTime }?,
   attribute label { token }?,
   empty
}

<label>

<label> Additional label
ModuleKRXManifest
Attributes
languageThe language of the label, identified with an identifier according to RFC 1766.
Status Optional
Datatype language
Contained by
KRXManifest: div
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element label { attribute language { xsd:language }?, text }

<manifest>

<manifest> The root of the manifest. One manifest describes one work.
ModuleKRXManifest
Attributes
xml:idThe identifier of the work. This will be used to refer to this manifest from the display of this text.
Status Optional
Datatype ID
Contained by
KRXManifest: manifests
May contain
Note

Currently, only one work can be described per one manifest file. Need to think about what to do with use cases that need multiple works. Use several manifest in a file?

Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="titleminOccurs="0"/>
  <elementRef key="description"/>
  <elementRef key="editions"/>
  <elementRef key="divisionsminOccurs="0"/>
 </sequence>
</content>
Schema Declaration
element manifest
{
   attribute xml:id { xsd:ID }?,
   ( krx_title?, krx_description, krx_editions, krx_divisions? )
}

<manifests>

<manifests> Root for manifests that contain multiple manifest elements.
ModuleKRXManifest
Contained by
May contain
KRXManifest: manifest
Content model
<content>
 <elementRef key="manifest"
  maxOccurs="unbounded"/>

</content>
Schema Declaration
element manifests { krx_manifest+ }

<map>

<map> Map of one textual feature to a specific token type
Modulederived-module-KRXManifest
Attributes
srcElement or simple matching expression (for XML texts) or regular expressions (for plain text) that identifies the textual feature
Status Optional
Datatype string
tokToken type
Status Optional
Legal values are:
h
Token is part of a heading
p
Token is part of a paragraph
n
Token is part of a note or annotation of any kind
q
Token is part of a quotation
v
Token is part of a verse line
Contained by
KRXManifest: tokenmap
May containEmpty element
Content model
<content>
 <empty/>
</content>
Schema Declaration
element map
{
   attribute src { string }?,
   attribute tok { "h" | "p" | "n" | "q" | "v" }?,
   empty
}

<note>

<note> An additional note
ModuleKRXManifest
Contained by
KRXManifest: description
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element note { text }

<resp>

<resp> Person responsible for some aspect of the work
ModuleKRXManifest
Attributes
role
Status Optional
Datatype string
Sample values include:
author
Author
compiler
Compiler
translator
Translator
keyA key identifying this person in some reference system.
Status Optional
Datatype string
Contained by
KRXManifest: creation
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element resp { attribute role { string }?, attribute key { string }?, text }

<title>

<title> Title of the work.
ModuleKRXManifest
Contained by
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element title { text }

<tokenmap>

<tokenmap> Mappings from textual features to token types
ModuleKRXManifest
Contained by
KRXManifest: edition
May contain
derived-module-KRXManifest: map
Content model
<content>
 <elementRef key="map"
  maxOccurs="unboundedminOccurs="1"/>

</content>
Schema Declaration
element tokenmap { krx_map+ }
Notes
1
There are in fact two possible root elements, the other being manifests for a grouping of manifest elements.
Date: 2020-12-07