Annotating Chinese Collocations with Multi Information

Ruifeng Xu,Qin Lu,Kam-Fai Wong,Wenjie Li
DOI: https://doi.org/10.3115/1642059.1642070
2007-01-01
Abstract:This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 head-words are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bi-gram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bi-gram collocations are examined which is essential to collocation research, especially for Chinese.
What problem does this paper attempt to address?