Parsing-based Automatic Chinese Term Extraction

Meng Zhang,Xiaojun Lin,Xu Dai,Xihong Wu
DOI: https://doi.org/10.1109/nlpke.2011.6138179
2011-01-01
Abstract:Term extraction is to automatically extract domain specific terms from a given corpus. Previous works of term extraction only focus on the termhood measurement, rather than the nested candidates. Different from previous methods which identify the nested candidates using the surface lexical information, such as word form characteristics, or the grammatical analysis described as the part-of-speech(POS) sequence patterns, this paper proposes a parsing-based approach to extract noun phrases as nested candidates, therefore, can fully explore the syntactic structure information. Experiments show that the proposed approach performs equally well as the conventional POS sequence patterns approach in the recall of candidates, but with fewer impossible ones. Combined with C-value as the termhood measure, the proposed approach obtains consistent improvements in the rank list of terms.
What problem does this paper attempt to address?