Structure Extension Of Tan Through Greedy Search

Runhua Li,Guojing Zhong,Limin Wang
DOI: https://doi.org/10.1145/3232829.3232835
2018-01-01
Abstract:Naive Bayes(NB) is well-known for its effective and relatively high accuracy for classification tasks. But its strong assumption that each attribute is independent diminishes its predictive accuracy. To weaken this assumption, some researchers proposed to allow limited number of interdependences between attributes. One of these attempts is Tree Augmented Naive Bayes(TAN), which is also the optimal 1-dependence classifier in Bayesian Network Classifiers(BNCs) for its excellent performance. But TAN can not be further promoted to 2-dependence if more interdependences between attributes are desired to be represented. Even the desired dependences have been found, adding it to the structure arbitrarily may cause the appearance of cycles if the direction is not correctly set. Those factors limited TAN's classification accuracy to much extent. We propose to apply greedy search algorithm on the conditional mutual information matrix generated by TAN to find all the significant dependences between attributes and then using a newly defined measure to set their direction. In this way, we can extend TAN to a higher dependence, name it kTAN, where k controls the number of allowed dependences of each attribute. Empirical studies showed that kTAN has significantly advantage over TAN on classification accuracy with acceptable cost of complexity.
What problem does this paper attempt to address?