A New Robust and Domain-Oriented Algorithm of Text Parsing

TAO Xianjun,WU Xiaojun,WANG Xiaodong,ZHENG Fang
DOI: https://doi.org/10.3969/j.issn.1003-0077.2010.04.006
2010-01-01
Abstract:In applications of natural language processing,especially in processing of spoken or web text,errors in word spelling and/or sentence structures are common to be found in the text to be processed.This paper describes a robust parsing algorithm based on the chart parsing method,which can identify the mistakes in the strings unrecognized by the domain vocabulary based word segmentation,and fix them into the correct forms according to the terminal information extracted from the current active arcs and the rule set.The experimental results showed that with error detection and correction by homonymous matching of pinyin syllables,this algorithm improvs the acception rate by 14.78% at the cost of an increase in the average number of loops by 9.363% compared with the robust parsing method of Yan.
What problem does this paper attempt to address?