Sparse and compositionally robust inference of microbial ecological networks

Zachary D. Kurtz,Christian L. Mueller,Emily R. Miraldi,Dan R. Littman,Martin J. Blaser,Richard A. Bonneau
DOI: https://doi.org/10.1371/journal.pcbi.1004226
2015-02-14
Abstract:16S-ribosomal sequencing and other metagonomic techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions, identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from 16S datasets are compositional, and thus, microbial abundances are not independent. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU interaction networks is severely under-powered, and additional assumptions are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological interactions from metagenomic datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological interaction network is sparse. To reconstruct the interaction network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. Because no large-scale microbial ecological networks have been experimentally validated, SPIEC-EASI comprises computational tools to generate realistic OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods in terms of edge recovery and network properties on realistic synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial interactions using data from the American Gut project.
Applications,Genomics,Computation
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the inference of microbial ecological networks. Specifically: 1. **Addressing the challenges in compositional data analysis**: Microbial community data are usually presented in the form of relative abundances, which means that the quantities of different operational taxonomic units (OTUs) are interdependent rather than independent. Traditional statistical methods, such as correlation analysis, may lead to spurious results when dealing with this type of data. Therefore, new statistical tools are required to accurately identify the relationships between OTUs. 2. **Coping with the challenges of high - dimensional data**: Microbial sequencing studies usually measure hundreds to thousands of OTUs, but the number of samples is usually only dozens to hundreds. This results in a severe shortage of data when inferring the association networks between OTUs, and additional information or assumptions are required for accurate inference. To address the above challenges, the paper proposes SPIEC - EASI (SParse Inverse Covariance Estimation for Ecological Association Inference), a new statistical method for inferring microbial ecological networks from amplicon sequencing datasets. SPIEC - EASI solves these problems by combining data transformation techniques for compositional data analysis and a graph - model inference framework, assuming that the underlying ecological association network is sparse. The specific steps include: - **Data transformation**: Apply the centered log - ratio (CLR) transformation to handle compositional data and ensure the compositional robustness of the data. - **Graph - model inference**: Use the neighborhood selection or sparse inverse covariance selection method to estimate the interaction graph, thereby inferring the association network between microorganisms. - **Model selection**: Adopt the Stability Regularization Selection (StARS) method to select appropriate regularization parameters to ensure that the inferred network has high stability and reproducibility. In addition, SPIEC - EASI also provides a set of tools for generating synthetic data, which can be generated from a variety of different network topologies for benchmarking in the absence of a gold - standard network with experimental verification. These tools are crucial for evaluating and comparing the performance of different inference methods. Overall, the paper aims to develop a new statistical method that can more accurately infer sparse and robust ecological networks from high - dimensional microbial community data, thereby providing a basis for understanding the structure and function of microbial communities.