Sentence Boundary Detection of Uyghur Based on Rules and Statistics

艾山·吾买尔,吐尔根·依步拉音
DOI: https://doi.org/10.3778/j.issn.1002-8331.2010.14.047
2010-01-01
Computer Engineering and Applications Journal
Abstract:Sentence boundary is an important initial task for many natural language processing applications,such as part-of-speech tagging and parsing etc.This paper proposes an automatic sentence boundary detection method of Uyghur based on rules and statistic.Firstly,the paragraph detecting algorithm classifies the ambiguous and unambiguous paragraph.In the second step,the rule based sentence boundary detector process the unambiguous paragraphs.Finally,the maximum entropy based sentence boundary detecting model identifies the ambiguous paragraph sentences.This method improves robustness of the method by making plenty use of rule to reduce the failure of the ME model to identify the unambiguous paragraphs which can be attributed to the sparsity of the training data used and the ME model to resolve ambiguity,the recall of this method reaches 98.77%.
What problem does this paper attempt to address?