Towards Knowledgebases


From texts to Textbases to Knowledgebases

  • TEI provides all the necessary elements for marking the content of texts for later reuse
  • Large textbases can be built using these features
  • To built these textbases into Knowledgebases, a further layer of informations is needed
  • This layer abstracts and condenses information items from the textbases and makes them accessible
  • One technology to built such information spaces is called topic maps
  • There exists other technologies, but topic maps are the most powerful and adaptable, so I will concentrate on them

Topic maps are ...

  • some kind of metadata
  • a way to make information about information exchangeable
  • ‘the global positioning system (GPS) of information’ (Charles Goldfarb)
  • defined in ISO 13250:2000
  • a member of the SGML family of standards

Topic maps have been ...

  • developped for almost ten years
  • accepted as an international standard in early 2000
  • perceived as a tool to navigate huge collections of information
  • designed to ‘enable multiple, concurrent views of sets of information objects’

Development of topic maps

  • started out with digitization of indices, glossaries and authority files
  • consequently generalized and abstracted the underlying concepts
  • to enable set operations like merging and splitting of topic maps, `published subject identifiers' (PSI) have been introduced
  • among other things topic maps can also be used to assemble virtual documents
  • implementation of tools operating on topic maps is still in a very early stage

Topic maps and metadata

There are important differences between topic maps and metadata in the usual sense of the term:

  • Metadata is information about a specific document
  • Topic maps is information about the information in a document
  • Metadata are encoded in a variety of formats, e.g. US-MARC, Dublin core, ...
  • Topic maps are encoded as SGML (WebSGML) documents

Topic maps and SGML/XML

  • In the ISO standard, topic maps are expressed as SGML and use HyTime (ISO 10744:1997) constructs to address into documents
  • After the acceptance of ISO 13250, a informal workgroup of TM vendors started working on a XML version of topic maps
  • The XML version, called XTM (Xml Topic Maps) has been published on Dec. 5, 2000 (see http://www.topicmaps.org)
  • This version does not simply recast ISO 13250, but introduces some changes in the underlying topic map model and syntax and uses XPointer to address into documents
  • XTM adds `published subject indicators' (PSI) to enable public sharing of topic maps
  • In this presentation, I will talk about topic maps as defined in ISO 13250

Topic maps and TEI

  • Topic maps complement the TEI markup in an ideal way by establishing a general and exchangeable way to describe views of a TEI encoded document.
  • TEI markup can be used to describe the features of a text, including its structure and other noteworthy elements like names, dates and the like.
  • Topic maps can be used to link the features marked up using TEI to ressources internal and external of the text.
  • It is thus possible to construct a very flexible and powerful information architecture around the TEI encoded texts.


8 Next | First| Previous Introduction to XML, Markup and the TEI Guidelines