Abstract:Bayesian networks (BNs) are a foundational model in machine learning and causal inference. Their graphical structure can handle high-dimensional problems, divide them into a sparse collection of smaller ones, underlies Judea Pearl's causality, and determines their explainability and interpretability. Despite their popularity, there are almost no resources in the literature on how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for BNs under their most common distributional assumptions. In this paper, we provide computationally efficient algorithms for both by leveraging BNs' graphical structure, and we illustrate them with a complete set of numerical examples. In the process, we show it is possible to reduce the computational complexity of KL from cubic to quadratic for Gaussian BNs.

What problem does this paper attempt to address?

This paper focuses on how to compute Shannon's entropy and Kullback-Leibler divergence in Bayesian Networks (BNs) and provides efficient algorithms for different types of Bayesian Networks. The authors point out that although Bayesian Networks have been widely used in machine learning and causal inference, there is little resource on how to compute these information-theoretic quantities under various distribution assumptions. The paper first introduces the basic concepts of Bayesian Networks, including their graphical structure, factorization, and how to use them to deal with high-dimensional problems. Then, the paper discusses in detail three common distribution assumptions: discrete Bayesian Networks, Gaussian Bayesian Networks, and conditional linear Gaussian Bayesian Networks. For each type, the paper explores their computational complexity and how to efficiently compute entropy and Kullback-Leibler divergence. For discrete Bayesian Networks, the paper demonstrates how to extract local distributions from the global distribution through marginalization and normalization, as well as the complexity of computing the global distribution. For Gaussian Bayesian Networks, the paper proposes a method to reduce the computational complexity from cubic to quadratic, which is particularly important for large networks. Finally, the paper also discusses conditional linear Gaussian Bayesian Networks, which is a hybrid model combining discrete and continuous variables. In addition, the paper discusses the advantages and disadvantages of exact computation and Monte Carlo sampling estimation, and points out that exact computation is more efficient in certain cases. The authors also implement a set of algorithms and integrate them into the bnlearn R package to facilitate research in this field. In summary, this paper fills a knowledge gap in the field of Bayesian Networks, providing efficient algorithms for computing key information quantities under different distribution assumptions, which helps improve the application of Bayesian Networks in machine learning.

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Learning Cluster Causal Diagrams: an Information-Theoretic Approach

Fast & Efficient Learning of Bayesian Networks from Data: Knowledge Discovery and Causality

Higher-Order Bayesian Networks, Exactly (Extended version)

A survey of Bayesian Network structure learning

Entropy of complex relevant components of Boolean networks

Learning Bayesian Network Parameters from Limited Data by Integrating Entropy and Monotonicity

Optimizing the Topology of Bayesian Network Classifiers by Applying Conditional Entropy to Mine Causal Relationships Between Attributes

Finding dissimilar explanations in Bayesian networks: Complexity results

Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory

Bayesian estimation of the Kullback-Leibler divergence for categorical sytems using mixtures of Dirichlet priors

Sparsifying Bayesian neural networks with latent binary variables and normalizing flows

Bayesian estimation of the Kullback-Leibler divergence for categorical systems using mixtures of Dirichlet priors

From Undirected Dependence to Directed Causality: A Novel Bayesian Learning Approach

Applications of Common Entropy for Causal Inference

Empirical estimation of entropy functionals with confidence

Advances in Bayesian network modelling: Integration of modelling technologies

Maximal Information Divergence from Statistical Models defined by Neural Networks

Testing Sparsity Assumptions in Bayesian Networks

Bayesian Neural Networks: Essentials

Bayesian networks and probabilistic reasoning about scientific evidence when there is a lack of data