Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

Limin Wang,Junjie Wang,Lu Guo,Qilong Li
DOI: https://doi.org/10.1007/s10489-023-05242-8
IF: 5.3
2024-01-27
Applied Intelligence
Abstract:Naive Bayes (NB) is one of the top ten machine learning algorithms whereas its attribute independence assumption rarely holds in practice. A feasible and efficient approach to improving NB is relaxing the assumption by adding augmented edges to the restricted topology of NB. In this paper we prove theoretically that the generalized topology may be a suboptimal solution to model multivariate probability distributions if its fitness to data cannot be measured. Thus we propose to apply log-likelihood function as the scoring function, then introduce an efficient heuristic search strategy to explore high-dependence relationships, and for each iteration the learned topology will be improved to fit data better. The proposed algorithm, called log-likelihood Bayesian classifier (LLBC), can respectively learn two submodels from labeled training set and individual unlabeled testing instance, and then make them work jointly for classification in the framework of ensemble learning. Our extensive experimental evaluations on 36 benchmark datasets from the University of California at Irvine (UCI) machine learning repository reveal that, LLBC demonstrates excellent classification performance and provides a competitive approach to learn from labeled and unlabeled data.
computer science, artificial intelligence
What problem does this paper attempt to address?