Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches

Jian Xu,Jingsong Yu,Huilin Wang
DOI: https://doi.org/10.1109/icgec.2010.70
2010-12-01
Abstract:Multiword expressions (MWEs) are important for practical applications, such as machine translation (henceforth, MT), multilingual information retrieval, data mining and other natural language processing. A method of combining similarity measure and statistical tool is proposed for automatically extracting English MWEs from the corpus of Chinese government white papers and work reports from 1991 to 2010. Statistical approach is employed to calculate the co-occurrence affinity between two words. Besides, similarity measure is harnessed to compute the semantic relations between words for improving MWE coverage, thus aiming at obtaining higher precision and recall in extracting candidate multiword expressions. Experimental results showed the proposed technique improved MWE extraction efficiently.
What problem does this paper attempt to address?