Extracting Chinese abbreviation-definition pairs from anchor texts.

Li-Xing Xie,Ya-Bin Zheng,Zhi-Yuan Liu,Mao-Song Sun,Can-Hui Wang
DOI: https://doi.org/10.1109/ICMLC.2011.6016980
2011-01-01
Abstract:This paper proposes an automatic scheme to extract Chinese abbreviations and their corresponding definitions from large-scale anchor texts. This method is motivated by the observation that the more frequently two anchor texts point to the same web page, the more related they are. Since abbreviation-definition pairs are highly related, they can be extracted from these related words. Our method involves three steps. Firstly we utilize external statistical features to extract candidate abbreviation-definition pairs from anchor texts. Secondly we extract internal features from candidate pairs and adopt Conditional Random Fields (CRFs) to compute a score for each candidate pair. Finally we combine external and internal features to generate the final pairs. Experimental results show that this method can accurately extract Chinese abbreviation-definition pairs from anchor texts and combining both external and internal features is effective for extracting abbreviation-definition pairs. © 2011 IEEE.
What problem does this paper attempt to address?