Multivariate Independence Set Search via Progressive Addition for Conditional Markov Acyclic Networks

Mattia Prosperi,Jiang Bian,Mo Wang
DOI: https://doi.org/10.1109/BIBM49941.2020.9313566
2020-01-01
Abstract:Estimation of conditional dependencies over a joint multivariate probability distribution is a difficult task for big data, e.g. -omics datasets, and it becomes quickly intractable when the number of variables involved grows large. For instance, structure learning in Bayesian networks has super-exponential complexity. Dimension reduction techniques such as principal component analysis can be useful but often transform the original space and can still pose problems with scalability. This substantially limits characterization of joint probability: in general, only pairwise or k-level correlations can be analyzed efficiently. We introduce the Multivariate Independence Set Search via Progressive Addition for Conditional Markov Acyclic Networks (MISS-PACMAN), which operates a greedy selection of jointly independent feature sets from a larger set of covariates, given a variable ordering. The method is non-parametric and can be used with any kernel function. MISS-PACMAN is therefore suitable for heterogeneous big data, as it combines flexibility and scalability. In our tests on both simulated and real-world data, using random forests as kernel, MISS-PACMAN was able to select independence feature sets linearly with the number of features. Further, by combining multiple independence sets, MISS-PACMAN well-approximates the underlying conditional structure of data variables (according to the generating Bayesian network), and compares favorably with other network structure discovery algorithms, such as the Peter-Clark and the fast incremental association Markov blanket.
What problem does this paper attempt to address?