Classification-based Chinese Collocation Extraction

Ruifeng Xu,Qin Lu,Kam-Fai Wong,Wenjie Li
DOI: https://doi.org/10.1109/nlpke.2007.4368048
2007-01-01
Abstract:Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better Fl performance compared to most existing algorithms for Chinese collocation extraction.
What problem does this paper attempt to address?