A local scoring-search algorithm for restricted Bayesian networks structure learning
Wang Zhong-Feng,Wang Zhi-Hai,Fu Bin
DOI: https://doi.org/10.3321/j.issn:0469-5097.2009.05.012
2009-01-01
Abstract:A Bayesian network is an efficient tool to deal with classification problem using probabilistic method.However,automatically identifying high-scoring Bayesian network structures from data is an non-deterministic polynomial-time hard(NP-hard) problem.Fortunately,a restricted Bayesian network serves as a bridge between Bayesian theory and practical applications.Recent work in supervised learning has shown that a surprisingly restricted Bayesian network classifier with strong assumptions of the same number of each conditional variable's parents,such as tree augmented naive bayes(TAN) model,is competitive with state of the art classifiers.This fact raises the question of whether a classifier with less restrictive assumptions could perform even better.In this paper,it tries to set different number of parents for different nodes in the Bayesian network structure.The strength of dependence related to two nodes decides whether there is a link between them.It adopts conditional mutual information testing as the measure method for dependence relationship.This is because conditional mutual information testing can not only measure the dependent strength between two nodes,but also score the performance of a Bayesian network structure.Crucially,this kind of measure has a property of additive decomposition of itself.Using this property,each node can be restricted individually,and the learned Bayesian network structure will be more suitable to the distribution of a dataset.According to the analysis above,a local scoring-search algorithm is proposed,which still based on conditional mutual information theory to build this kind of restricted networks.Experimental results have shown that it is more efficient than most common scoring methods,such as BDeu scoring algorithm,minimum description length(MDL) scoring algorithm,conditional mutual information(CMI) scoring algorithm and TAN algorithm,on 20 datasets obtained from the university of california IV vine(UCI) repository of machine learning databases.