Schema for Manifest files in the KanripoX project

2020-10-12

Table of contents

1 Overview

The Manifest.xml described here contains information about a set of editions that are grouped here together, usually for the purpose of further description and processing.

There are two main elements under the root element manifest1:

The editions element holds information about the editions that are collected here. It contains edition elements, which give the details for each edition. This includes also the type, which can be either "documentary" or "interpretative". Documentary editions are editions that strive to reproduce an existing print edition, while interpretative editions do reflect the views of the editor and do not follow one single edition.

Other details for editions that will be collected here are the id, which is a unique label (or identifier) used to refer to this specific edition within the manifest and the processing systems.

The edition element may have the following children:

Both of these elements are optional. description contains a description of the edition, which could be the title, but also other information deemed relevant. divisions allows reference to divisions within the edition. This element is repeatable and when occuring more than once the edition is considered made up of the sequence of these divisions.

The divisions element can also occur as a child element of manifest, optionally following the editions element. If used here, there will be only one division element, which holds all subdivisions as possibly nesting div elements. The purpose of this element is to provide an entry point to the editions, which is neither tied to one specific edition, nor to a hyperlink or similar in a technical sense. The label on the div elements is used to provide a human readable label that can be used to point to that specific division, much the same as "Chapter 2" will (usually) refer to the same section of a work, no matter which edition is used. To serve as a link between this nesting structure of chapters, sections and so forth, each div can have one or more edRef elements, which point to the text span in one of the editions that is covering this specific section.
<div label="第一章">
 <edRef end="61key="KR5c0057_tls"
  start="0"/>

 <edRef end="58key="CH1a0918a_chant"
  start="0"/>

 <edRef end="402key="CH1a0918b_chant"
  start="2"/>

</div>
In this example, the start and end attributes give the number of the first and last token that is part of this section of a text, thus identifying the text span independent of the text format of the text. Other possibilities for adressing a text span are available if the edition is in TEI/XML.

2 The elements defined for the Manifest

Schema KRXManifest: Elements

<description>

<description> Description of the edition or item this element is attached to.
ModuleKRXManifest
Contained by
KRXManifest: div edition manifest
May contain
KRXManifest: note
character data
Content model
<content>
 <alternate maxOccurs="unbounded"
  minOccurs="0">

  <textNode/>
  <elementRef key="noteminOccurs="0"/>
 </alternate>
</content>
Schema Declaration
element description { ( text | krx_note? )* }

<div>

<div> One specific subdivision on any level.
ModuleKRXManifest
Attributes
labelA label to identify the subdivision, can be any string, but should be unique in the manifest. This can be used to access this textual division.
Status Optional
Datatype token
editionA reference to the edition, as defined elsewhere in this manifest.
Status Optional
Datatype IDREF
startThe sequencial number of the first token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
endThe sequencial number of the last token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
dividIf the source file of this edition has an identifier (usually a xml:id for this subdivision), it can be recorded here.
Status Optional
Datatype token
Contained by
KRXManifest: div divisions
May contain
KRXManifest: description div edRef
Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="edRefminOccurs="0"/>
  <elementRef key="description"
   minOccurs="0"/>

  <elementRef key="div"
   maxOccurs="unboundedminOccurs="0"/>

 </sequence>
</content>
Schema Declaration
element div
{
   attribute label { token }?,
   attribute edition { xsd:IDREF }?,
   attribute start { xsd:nonNegativeInteger }?,
   attribute end { xsd:nonNegativeInteger }?,
   attribute divid { token }?,
   ( krx_edRef?, krx_description?, krx_div* )
}

<divisions>

<divisions> The internal subdivisions of the work under consideration.
ModuleKRXManifest
Attributes
editionIf necessary, the edition for which these textual divisions are valid can be given here
Status Optional
Datatype token
Contained by
KRXManifest: edition manifest
May contain
KRXManifest: div
Content model
<content>
 <elementRef key="div"
  maxOccurs="unboundedminOccurs="1"/>

</content>
Schema Declaration
element divisions { attribute edition { token }?, krx_div+ }

<edition>

<edition> One edition of the work. If there are multiple divisions, this indicates that the sequence of these divisions make up the work.
ModuleKRXManifest
Attributes
idThe identifier of the edition. This is required and has to be unique within this manifest. It will be used by the processing tools to refer to this edition.
Status Required
Datatype ID
formatThe parsing tool is selected based on the format given here, there are two formats defined at the moment. Additional formats can be added, but require a plugin to parse them.
Status Required
Legal values are:
xml/TEI
TEI file encoded in XML
txt/mandoku
Mandoku format
locationThis gives either the relative path to the local folder containing the edition or a resolvable remote reference to the edition, for example on github.
Status Required
Datatype string
Note

TODO: format for remote reference.

TODO: Format for identifying portion of text in file.

typeThe edition has to be declared as either ‘documentary’ or ‘interpretative’.
Status Required
Legal values are:
documentary
An edition that documents an existing print source as faithful as possible, without editorial changes.
interpretative
An edition that might be based on a print source, but possibly makes editorial changes.
languageThe language of the document, identified with an identifier according to RFC 1766.
Status Optional
Datatype language
Contained by
KRXManifest: editions
May contain
KRXManifest: description divisions
Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="description"/>
  <elementRef key="divisions"
   maxOccurs="unboundedminOccurs="0"/>

 </sequence>
</content>
Schema Declaration
element edition
{
   attribute id { xsd:ID },
   attribute format { "xml/TEI" | "txt/mandoku" },
   attribute location { string },
   attribute type { "documentary" | "interpretative" },
   attribute language { xsd:language }?,
   ( krx_description, krx_divisions* )
}

<editions>

<editions> The editions representing the work under consideration. Work is taken in a very broad sense here.
ModuleKRXManifest
Contained by
KRXManifest: manifest
May contain
KRXManifest: edition
Content model
<content>
 <elementRef key="edition"
  maxOccurs="unboundedminOccurs="1"/>

</content>
Schema Declaration
element editions { krx_edition+ }

<edRef>

<edRef> Reference to this subdivision in one specific edition, identified by the key.
ModuleKRXManifest
Attributes
startThe sequencial number of the first token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
endThe sequencial number of the last token of this division in the token list.
Status Optional
Datatype nonNegativeInteger
keyA reference to the edition, as defined elsewhere in this manifest.
Status Optional
Datatype IDREF
timestampThe timestamp in ISO format, e.g. 2020-10-09T14:23:52+09:00.
Status Optional
Datatype dateTime
Contained by
KRXManifest: div
May containEmpty element
Content model
<content>
 <empty/>
</content>
Schema Declaration
element edRef
{
   attribute start { xsd:nonNegativeInteger }?,
   attribute end { xsd:nonNegativeInteger }?,
   attribute key { xsd:IDREF }?,
   attribute timestamp { xsd:dateTime }?,
   empty
}

<manifest>

<manifest> The root of the manifest. One manifest describes one work.
ModuleKRXManifest
Contained by
KRXManifest: manifests
May contain
Note

Currently, only one work can be described per one manifest file. Need to think about what to do with use cases that need multiple works. Use several manifest in a file?

Content model
<content>
 <sequence maxOccurs="1minOccurs="1">
  <elementRef key="titleminOccurs="0"/>
  <elementRef key="description"/>
  <elementRef key="editions"/>
  <elementRef key="divisionsminOccurs="0"/>
 </sequence>
</content>
Schema Declaration
element manifest { krx_title?, krx_description, krx_editions, krx_divisions? }

<manifests>

<manifests> Root for manifests that contain multiple manifest elements.
ModuleKRXManifest
Contained by
May contain
KRXManifest: manifest
Content model
<content>
 <elementRef key="manifest"
  maxOccurs="unbounded"/>

</content>
Schema Declaration
element manifests { krx_manifest+ }

<note>

<note> An additional note
ModuleKRXManifest
Contained by
KRXManifest: description
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element note { text }

<title>

<title> Title of the work.
ModuleKRXManifest
Contained by
KRXManifest: manifest
May containCharacter data only
Content model
<content>
 <textNode/>
</content>
Schema Declaration
element title { text }
Notes
1
There are in fact two possible root elements, the other being manifests for a grouping of manifest elements.
Date: 2020-10-12