Abstract:This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of "big", efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest Bayesian network classifiers (BNCs) satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters' estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of Hierarchical Dirichlet Processes for accurate BNC parameter estimation. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with Random Forest in terms of prediction, while keeping the out-of-core capability and superior classification time.

Efficient parameter learning for Bayesian Network classifiers following the Apache Spark Dataframes paradigm

Sparse Bayesian Approach to Fast Learning Network for Multiclassification.

Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

Data-Intensive Learning Of Uncertain Knowledge

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Information-Theoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology

Data-Intensive Inferences Of Large-Scale Bayesian Networks

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data

Model Averaging in Distributed Machine Learning: a Case Study with Apache Spark

Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark

Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes

Nonparametric Bayes Classification via Learning of Affine Subspaces

A Parallel Algorithm for Bayesian Network Parameter Learning Based on Factor Graph

Distributed Learning from Interactions in Social Networks

Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Distributed Bayesian Piecewise Sparse Linear Models

Advances in Bayesian network modelling: Integration of modelling technologies

Learning Graphical Models from a Distributed Stream

Creating simple predictive models in ecology, conservation and environmental policy based on Bayesian belief networks

Improving parameter learning of Bayesian nets from incomplete data

Bayesian Artificial Neural Networks for frontier efficiency analysis