Variational inference for sparse network reconstruction from count data

Julien Chiquet,Mahendra Mariadassou,Stéphane Robin
DOI: https://doi.org/10.48550/arXiv.1806.03120
2018-06-08
Methodology
Abstract:In multivariate statistics, the question of finding direct interactions can be formulated as a problem of network inference - or network reconstruction - for which the Gaussian graphical model (GGM) provides a canonical framework. Unfortunately, the Gaussian assumption does not apply to count data which are encountered in domains such as genomics, social sciences or ecology. To circumvent this limitation, state-of-the-art approaches use two-step strategies that first transform counts to pseudo Gaussian observations and then apply a (partial) correlation-based approach from the abundant literature of GGM inference. We adopt a different stance by relying on a latent model where we directly model counts by means of Poisson distributions that are conditional to latent (hidden) Gaussian correlated variables. In this multivariate Poisson lognormal-model, the dependency structure is completely captured by the latent layer. This parametric model enables to account for the effects of covariates on the counts. To perform network inference, we add a sparsity inducing constraint on the inverse covariance matrix of the latent Gaussian vector. Unlike the usual Gaussian setting, the penalized likelihood is generally not tractable, and we resort instead to a variational approach for approximate likelihood maximization. The corresponding optimization problem is solved by alternating a gradient ascent on the variational parameters and a graphical-Lasso step on the covariance matrix. We show that our approach is highly competitive with the existing methods on simulation inspired from microbiological data. We then illustrate on three various data sets how accounting for sampling efforts via offsets and integrating external covariates (which is mostly never done in the existing literature) drastically changes the topology of the inferred network.
What problem does this paper attempt to address?