A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM.

Boli Wang,Xiaodong Shi,Zhixing Tan,Yidong Chen,Weili Wang
DOI: https://doi.org/10.1007/978-3-319-49508-8_36
2016-01-01
Abstract:Most of ancient Chinese texts have no punctuations or segmentation of sentences. Recent researches on automatic ancient Chinese sentence segmentation usually resorted to sequence labelling models and utilized small data sets. In this paper, we propose a sentence segmentation method for ancient Chinese texts based on neural network language models. Experiments on large-scale corpora indicate that our method is effective and achieves a comparable result to the traditional CRF model. Implementing sentence length penalty, using larger Simplified Chinese corpora, or dividing corpora by ages can further improve performance of our model.
What problem does this paper attempt to address?