Keywords: syntactic frame retrieval, 'kun' reading, syntactic parallelism, couplets
Development of an automatic syntactic analysis system for Classical Chinese texts calls for a corpus for learning/evaluation. This paper introduces an idea for how to make this corpus.
This corpus for learning/evaluation must contain metadata of syntactic structure of the original texts. This paper presents practical ideas on how to extract information on syntactic structure from Classical Chinese texts, using our existing knowledge on Classical Chinese language and texts.
We can obtain a huge number of Japanese 'kun' reading assigned to each kanji in the index of the Chinese character-Japanese dictionary edited in Japan. A single Chinese character almost corresponds to a morpheme in the context of Classical Chinese text. A 'kun' reading of a morpheme contains information on the lexical meaning of the morpheme and detailed syntactic information such as parts of speech and their conversion. This paper offers an idea for using these data as a dictionary for automatic syntactic analysis.
Some types of Classical Chinese poem frequently use couplets. These couplets appearing in Classical Chinese verse are known for their sophistication with parallel syntax. This parallelism can show us a lot of information, such as, information needed to integrate each morpheme into a compound word, information on the verb-object structure, information on the hierarchy of predicates, etc.. This paper highlights an idea to use these kinds of information for supporting syntactic analysis.
Then this paper reviews the possibility of syntactic frame retrieval from the corpus with the metadata as described above.