Automatic Tagging of Interchangeable Characters in Pre-Qin Literature

Minxuan Feng,Liu Liu,Ning Xi
DOI: https://doi.org/10.1007/978-3-642-36337-5_35
2013-01-01
Abstract:Interchangeable Characters (ICs) is an important issue in semantic analysis of Pre-Qin literature. This paper employs three knowledge databases resources for IC tagging: 1) IC frequency table built from 25 Pre-Qin literatures and Commentary on the Thirteen Confucian Classics based onChinese Dictionary; 2) book-specific IC database based on philological and exegetic studies; 3) Academia Sinica IC bank based on the tagged corpus of ancient Chinese. Experiments are conducted to tag ICs in Mo-tse, The Book of Filial Piety and The Songs of Chu respectively and show that the second knowledge database, though of a small scale, is very reliable, that the third database can be a useful supplementary to it and that the first database alone can also provide useful information for the purpose. The research makes it clear that the construction of the IC knowledge base is of great significance in improving the performance of automatic tagging of IC.
What problem does this paper attempt to address?