Feature Selection for Text Classification with Naïve Bayes

Jingnian Chen,Houkuan Huang,Shengfeng Tian,Youli Qu
DOI: https://doi.org/10.1016/j.eswa.2008.06.054
IF: 8.5
2009-01-01
Expert Systems with Applications
Abstract:As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naïve Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naïve Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.
What problem does this paper attempt to address?