Finding The Optimal Feature Representations For Bayesian Network Learning

Limin Wang,Chunhong Cao,Xiongfei Li,Haijun Li
DOI: https://doi.org/10.1007/978-3-540-71701-0_96
2007-01-01
Abstract:Naive Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. However, many different versions of Bayes model consider only one aspect of a particular word. In this paper we define an information criterion, Projective Information Gain, to decide which representation is appropriate for a specific word. Based on this, the conditional independence assumption is extended to make it more efficient and feasible and then we propose a novel Bayes model, General Naive Bayes (GNB), which can handle two representations concurrently. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
What problem does this paper attempt to address?