Method Combining Rule-Based And Corpus-Based Approaches For Oracle-Bone Inscription Information Processing

Huiying Cai,Minghu Jiang,Beixing Deng,Lin Wang
DOI: https://doi.org/10.1007/978-3-540-37275-2_92
2006-01-01
Abstract:Word segmentation and part of speech (POS) tagging are basis of processing oracle-bone inscription by using computer. It is hard to build a large tagged oracle-bone inscription corpus with grammar information. This is an obstacle if we want to use statistical method. In this paper, we propose to solve both problems with methods combining corpus-based and rule-based approaches. The accuracy of segmentor and tagger are 98.33% and 96.75% respectively. Our experiment result shows that the combining method is quite practical for processing the oracle-bone inscription, especially when the corpus is too sparse. In the end, we briefly discuss how to use the tagged result to complete syntax analysis with rule-based method.
What problem does this paper attempt to address?