Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Marco Scutari
2024-01-15
Abstract:Bayesian networks (BNs) are a foundational model in machine learning and causal inference. Their graphical structure can handle high-dimensional problems, divide them into a sparse collection of smaller ones, underlies Judea Pearl's causality, and determines their explainability and interpretability. Despite their popularity, there are almost no resources in the literature on how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for BNs under their most common distributional assumptions. In this paper, we provide computationally efficient algorithms for both by leveraging BNs' graphical structure, and we illustrate them with a complete set of numerical examples. In the process, we show it is possible to reduce the computational complexity of KL from cubic to quadratic for Gaussian BNs.
Artificial Intelligence,Machine Learning,Computation
What problem does this paper attempt to address?
This paper focuses on how to compute Shannon's entropy and Kullback-Leibler divergence in Bayesian Networks (BNs) and provides efficient algorithms for different types of Bayesian Networks. The authors point out that although Bayesian Networks have been widely used in machine learning and causal inference, there is little resource on how to compute these information-theoretic quantities under various distribution assumptions. The paper first introduces the basic concepts of Bayesian Networks, including their graphical structure, factorization, and how to use them to deal with high-dimensional problems. Then, the paper discusses in detail three common distribution assumptions: discrete Bayesian Networks, Gaussian Bayesian Networks, and conditional linear Gaussian Bayesian Networks. For each type, the paper explores their computational complexity and how to efficiently compute entropy and Kullback-Leibler divergence. For discrete Bayesian Networks, the paper demonstrates how to extract local distributions from the global distribution through marginalization and normalization, as well as the complexity of computing the global distribution. For Gaussian Bayesian Networks, the paper proposes a method to reduce the computational complexity from cubic to quadratic, which is particularly important for large networks. Finally, the paper also discusses conditional linear Gaussian Bayesian Networks, which is a hybrid model combining discrete and continuous variables. In addition, the paper discusses the advantages and disadvantages of exact computation and Monte Carlo sampling estimation, and points out that exact computation is more efficient in certain cases. The authors also implement a set of algorithms and integrate them into the bnlearn R package to facilitate research in this field. In summary, this paper fills a knowledge gap in the field of Bayesian Networks, providing efficient algorithms for computing key information quantities under different distribution assumptions, which helps improve the application of Bayesian Networks in machine learning.