Finding optimal Bayesian networks by a layered learning method

Yang Yu,Gao Xiaoguang,Guo Zhigao
DOI: https://doi.org/10.21629/JSEE.2019.05.12
IF: 1.363
2019-01-01
Journal of Systems Engineering and Electronics
Abstract:It is unpractical to learn the optimal structure of a big Bayesian network (BN) by exhausting the feasible structures, since the number of feasible structures is super exponential on the number of nodes. This paper proposes an approach to layer nodes of a BN by using the conditional independence testing. The parents of a node layer only belong to the layer, or layers who have priority over the layer. When a set of nodes has been layered, the number of feasible structures over the nodes can be remarkably reduced, which makes it possible to learn optimal BN structures for bigger sizes of nodes by accurate algorithms. Integrating the dynamic programming (DP) algorithm with the layering approach, we propose a hybrid algorithm — layered optimal learning (LOL) to learn BN structures. Benefitted by the layering approach, the complexity of the DP algorithm reduces to O(ρ2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n−1</sup> ) from O(n2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n−1</sup> ), where ρ < n. Meanwhile, the memory requirements for storing intermediate results are limited to O(C <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k#/2</sup> <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k#</inf> ) from O(C <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n/2</sup> <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</inf> ), where k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">#</sup> <n. A case study on learning a standard BN with 50 nodes is conducted. The results demonstrate the superiority of the LOL algorithm, with respect to the Bayesian information criterion (BIC) score criterion, over the hill-climbing, max-min hill-climbing, PC, and three-phrase dependency analysis algorithms.
What problem does this paper attempt to address?