Research on Automatic Extraction of Chinese New Domain-specific Terms Comprising Lettered-words

JIANG Shaohua,DANG Yanzhong
DOI: https://doi.org/10.3969/j.issn.1000-3428.2007.02.016
2007-01-01
Abstract:Extraction of new domain-specific terms is one of the important topics in Chinese natural language processing.Aiming at the limitation of the current methods and the specialties of many domain-specific terms are lettered-words,a novel approach combined with statistic technique and rule is proposed to extract new special semantic strings.Co-occurrence of character strings is formed by text segmentation based on matching longer strings first combined with frequency statistics.No-meaningful character strings are trimmed by collocation rules.Filtered by domain lexicon and membership degree,new domain-specific terms are extracted finally.This method can extract new special semantic strings,phrases and words,including unknown words like lettered-words and domain-specific terms,their frequency is larger than 2.Experiments show that this extraction technique is effective and indicate new domain-specific terms’ distribution characteristic of precision ratio.
What problem does this paper attempt to address?