CRF-based Approach to Sentence Segmentation and Punctuation for Ancient Chinese Prose

ZHANG Kaixu,XIA Yunqing,YU Hang
DOI: https://doi.org/10.3321/j.issn:1000-0054.2009.10.039
2009-01-01
Abstract:Though punctuation is important in modern Chinese, punctuation marks were not used in ancient Chinese. Thus, ancient Chinese literature is very hard for modern Chinese to read. This article presents a conditional random field (CRF) based approach to automate ancient Chinese prose punctuation using the mutual information and the t-test difference as features. Tests on Lunyu and Shiji show that the approach outperforms the state-of-the-art method by 0. 124 on the F1 score for sentence segmentation. Thus, this approach yields promising results for sentence punctuation analysis on both Lunyu and Shiji. The cascaded CRF approach can deal with ancient Chinese prose punctuation analysis more effectively than the single CRF.
What problem does this paper attempt to address?