IRIZ Character Conversion Tools

by Christian Wittern


Overview

Kanji conversion between JIS and Big5 is a rather complex process and can only be badly achieved in a one-to-one conversion. Rather, one needs to distinguish what the aim of the conversion and the need of the user is and perform the conversion accordingly.
In order to solve this problem we came up with the following strategy:
  1. First, you should to prepare your text file for the conversion process.
  2. Second, you should decide which level of precision or strictness you need in the resulting file (see below). This depends on the purpose of your conversion.
  3. Third, you run the conversion program on a Macintosh or DOS machine.
  4. Fourth, you look at the resulting file in an editor and get familiar with the replacement codes that we use in our conversion.
  5. Finally, if you converted from Big5 to JIS and would like post-war JIS characters, you can use our Old2New tool.

Conversion between the most frequently encountered encodings of Chinese Characters is no easy task. What is attempted here is only the conversion between the ShiftJis encoding (frequently encountered in Japan) and the Big5 encoding (originating from Taiwan).

The main problems in this conversion process and my attempt to solve them is outlined here, a proper understanding of these issues is necessary for using the conversion tools successfully.

Here are some of the problems:

It should be clear from this that the process of converting between these codes is far from straightforward. To be practicable, the conversion must often perform some kind of remapping. As the degree of the desirable tolerance for this remapping depends on the text and the purpose of the user, we attempt no simple, fixed translation. Rather, we use several different mapping tables.

Grades of Precision

We distinguish three grades of precision for the conversion:

Replacement Conventions

The following conventions apply for the replacement characters inserted in the text:

When encountering the above codes in a converted text, you can use a code table (or the original file, where available) to determine which character could not be converted. Then you can use global replace in an editor to convert it to the character you want to have in that position.

Author:Christian Wittern
Last updated: 95/04/27