Feature Selection for Text Classification Based on Part of Speech Filter and Synonym Merge

Sijun Qin,Jia Song,Pengzhou Zhang,Yue Tan
DOI: https://doi.org/10.1109/fskd.2015.7382024
2015-01-01
Abstract:In recent years, text categorization based on machine learning is a widely used technology in the field of natural language processing and text mining and has gained many advances. Feature selection is one of the key problems in text categorization. The chief obstacles to feature selection are noise and sparseness. In this paper, we propose an approach of Chinese text feature selection based on CV (contribution value), POS (part of speech) filter and synonym merge. We carry out experiments over corpus-TanCorpV1.0 and find that the proposed method performs better than traditional ones.
What problem does this paper attempt to address?