An algorithm for selecting Chinese features based on TF-NIDF weight

Yongli Li,Yanheng Liu,Mo Shi,Liyan Dong,Zhen Li,Lixiang Liu,Pengfei Yan
DOI: https://doi.org/10.1109/ICINFA.2010.5512348
2010-01-01
Abstract:This article discusses the problem of selecting Chinese features based on TF-IDF weight in text categorization. TF-IDF weight is commonly used in text categorization for its simplexes. However, it can not express the relationship between a feature appearance frequency in one class and appearance frequency in other classes. To solve the problem, we designed TF-NIDF weighting method to express the relationship and computer feature weight. We also incorporated the weight into Naïve Bayesian classifier and tested it on Chinese text data. Experiments showed that Naïve Bayesian classifier with features selection based on TF-NIDF weight have a higher categorization precision than Naïve Bayesian classifier with features selection based on traditional TF-IDF weight. ©2010 IEEE.
What problem does this paper attempt to address?