Method of Feature Selection for Text Categorization with Bayesian Classifiers

CHEN Jing-nian,HUANG Hou-kuan,TIAN Feng-zhan,QU You-li
DOI: https://doi.org/10.3778/j.issn.1002-8331.2008.13.007
2008-01-01
Computer Engineering and Applications Journal
Abstract:Feature selection is an important preprocessing technology in text classification.It can improve the efficiency and accuracy of a text classifier.The key of feature selection in text classification is to find an effective feature evaluation metric.In general,the effect of a feature evaluation metric for various classifiers can be very different,and thus a good feature evaluation metric should consider classifier characteristics.As the Na ve Bayesian classifier is very simple and efficient and highly sensitive to feature selection,so the research of feature selection specially for it is important.This paper presents a feature evaluation metric for the Na ve Bayesian classifier applied on multi-class text datasets:Class Discriminating Measure(CDM).Experiments of text classification with Na ve Bayesian classifiers were carried out on two multi-class texts collections.As the results indicate,CDM gains obviously better selecting effect than other feature selection approaches.
What problem does this paper attempt to address?