to Home Page

IRIZ Format Conversion Tool (FMTCNV)


Summary

This is a tool by Christian Wittern for the Macintosh and DOS environments. It was developed in the framework of the Zen KnowledgeBase project at the International Research Institute for Zen Buddhism at Hanazono University.
Using the FMTCNV tool, you can convert a RAW text file (described below) into one optimized for line-based fast searches (APP format); or into one designed for database or proofreading purposes (TAB format). Conversions from and into any of these three formats are supported.

Required software

For using this utility you need the PERL program set (MacPerl for the Mac) which is found on the ZenBase CD 1. Before using FMTCNV make sure that PERL is installed properly on your system (installation for DOS or for Mac).

The program does not touch your original file but creates a new one to which you must give a name.

Usage of the FMTCNV program

  1. In DOS environment:
    Invoke the program on a DOS prompt by typing
    FMTCNV <filename>.

  2. On a Macintosh:
    Simply drag the file you would like to convert on the FMTCNV pyramid icon.

In both cases the program will ask you about the target format and the name you want the resulting converted file to have.

Supported formats

Electronic texts can take different shapes depending on their purpose. This tool converts between three basic formats. All header and comment lines must begin with the character #. The three supported formats are the following:

  1. The RAW format

    This format gives features a page (and if necessary segment) number and then has the text as it appears in the original edition, with each line closed by a return. The number always precedes the Chinese text and must be in single byte ASCII digit. For example (XXXXXXXXX represents a line of JIS or Big-5 Chinese characters):

    486a
    XXXXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXXXXXXXX
    .
    .
    .
    XXXXXXXXXXXXXXXXXX
    486b
    XXXXXXXXXXXXXXXXXXXX
    etc.

  2. The APP format

    This format has been developed by Urs App and is designed for speedy line-based search. It changes the original line breaks by generating lines that end with a punctuation mark, thus preventing possible interrupts of Chinese compounds. In order to define the exact location of the original line break, this format adds two numbers (Num1 and Num2)

    A line in the APP Format has four parts, each separated by a comma:
    PageLine, Num1, XXXXXXXXXXXXXXXXX, Num2

    Example: 486b26,3,XXXXXXXXXXXXXXXXXXX,0

    Here, 486b26 is line 26 in segment b of page 486; 3 tells you that three characters have been pasted over from the preceding line; and 0 indicates that no character was cut off and moved to the next line (i.e., that the line break is the same as in the original printed text).

    If used with the fast text search utility fgrep.com which is included on the ZenBase CD1, full-text search speeds of up to 2 MB per second are possible. This is the best and safest format for finding expressions that are more than one character in length.

  3. The TAB format

    Each line of a text in TAB format features the page and line number, followed by a <tab> character preceding the Chinese text.

    PageLine <tab> XXXXXXXXXXXXXXXXX

    Example: 486b26<tab>XXXXXXXXXXXXXXXXXXX

    We use this format mostly for proofreading purposes, but it could also be useful for database applications.


Author:Christian Wittern
Last updated: 95/04/27