Automatic Domain-Specific Term Extraction and Its Application in Text Classification

T Liu,XL Wang,G Yi,ZM Xu,Q Wang
DOI: https://doi.org/10.3321/j.issn:0372-2112.2007.02.031
2005-01-01
Abstract:A statistical method is proposed for domain-specific term extraction from domain comparative corpora. It takes distribution of a candidate word among domains and within a domain into account. Entropy impurity is used to measure distribution of a word among domains and within a domain. Normalization step is added into the extraction process to cope with unbalanced corpora. So it characterizes attributes of domain-specific term more precisely and more effectively than previous term extraction approaches. Domain-specific terms are applied in text classification as the feature space. Experiments show that it achieves better performance than traditional methods for feature selection.
What problem does this paper attempt to address?