Research On Semantic Disambiguation In Treebank

Lin Miao,Xueqiang Lv,Yunfang Wu,Yue Wang
DOI: https://doi.org/10.1007/978-3-319-25255-1_54
2015-01-01
Abstract:The increasingly widespread application of natural language processing technology leads parsing to play a significant role. As a result, the size and quality of treebank have become the focus of relevant research. However, there exists data sparseness when we use the treebank to parse. With the help of Cilin semantic information and words contextual information, this paper proposes a context-based lexical semantics disambiguation method. After applying this method on CTB (Chinese Treebank) 5.0 and TCT (Tsinghua Chinese Treebank), using Berkeley Parser achieved relatively good results. In Penn Chinese Treebank, the precision and recall rates reached 85.35% and 84.34% respectively, and the F value reached 84.84%. Comparing with the parsing results of using the original corpus, the correct rate increased by 1.86% and the recall rate increased by 1.02% and the comprehensive index F value increased by 1.35%. As consequence, the overall parsing error rate dropped by 8.17%.
What problem does this paper attempt to address?