Toward a Syntactic Analysis of Classical Chinese Texts

Koichi Yasuoka
Institute for Research in Humanities, Kyoto University

  The most difficult point in the syntactic analysis of classical Chinese
texts is that they don't have any spaces or punctuations between words
or between sentences. They consist of continuous strings of Chinese
characters from the start to the end of texts. Contrary to the analysis
of modern Chinese texts, which have several punctuations and can be
fragmented into phrases with the punctuations, the analysis of classical
Chinese texts has to begin with finding out the ends of sentenses.
  In this paper we show several key points toward the analysis of
classical Chinese texts.
  First, we separate rhymes from proses automatically. Classical Chinese
texts very often include rhymed passages inside them, and the rhymes are
connected from/to the proses without any punctuations such as quotation
marks. They look continuous. However, rhymes have typical meters every
eight or twelve characters. With this point our method can find rhymes
from classical Chinese texts easily. Rhymes are often written in rather
irregular syntax, so they need a different processing from that for the
proses.
  Second, we find several delimiting characters to fragment the proses
into sentences. For example, in classical Chinese texts, the character
"也" is used at the end of a sentence, and other usage of "也" is
extremely rare. So are "矣" and "焉". To the contrary, for example,
the character "嗚" is very often used at the start of a sentence. We 
use these characters to delimit classical Chinese texts.
  And then, we use our original morphological analysis for the proses
of classical Chinese texts. We are now developing a corpus for the
morphological analysis. The author's colleagues, Tomohiko Morioka and
Naoki Yamazaki, plan presentations about the morphological analysis and
the syntactic frame retrieval in the other papers.