Preparing Files for Character Conversion


Creating Text only files

If you used a wordprocessor to create your files, be careful to save them as text only files. Most wordprocessors have such an option under the menu entry "save as". All formatting you applied to the text will be lost. If you want to preserve the formatting, you better try to convert it to generalized markup like SGML or LaTex.

Standardizing the Character Usage

If you have a Japanese text file, that you want to convert to Big5, it is recommended that you use NEW2OLD from the suite of our normalizing tools. As Big5 contains only traditional forms, only such characters are easily converted.

Performing the Conversion

The invocation of the program is platform specific, please refer to Converting texts on the Mac and Converting texts on DOS platforms. Whatever platform you use, you will get a new text file containing the result of the conversion, you original file remains untouched.

Normalizing to Characters in the Codeset

The number of character references (to the extended CNS code, by means of CEF codes) inserted in your document depends on the type of conversion used. If you find, that too many such references appear in your document, you can use another normalizing tool, normaliz. This program works on Big5 coded texts and tries to find Big5 characters, that can be used instead of those referred to by the CEF tags.

Author: Christian Wittern