Abstract:Microbiome data are complex in nature, involving high dimensionality, compositionally, zero inflation, and taxonomic hierarchy. Compositional data reside in a simplex that does not admit the standard Euclidean geometry. Most existing compositional regression methods rely on transformations that are inadequate or even inappropriate in modeling data with excessive zeros and taxonomic structure. We develop a novel relative-shift regression framework that directly uses compositions as predictors. The new framework provides a paradigm shift for compositional regression and offers a superior biological interpretation. New equi-sparsity and taxonomy-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. As a result, the framework can automatically identify clinically relevant microbes even if they are important at different taxonomic levels. A unified finite-sample prediction error bound is developed for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies. The application to a preterm infant study reveals novel insights of association between the gut microbiome and neurodevelopment.

What problem does this paper attempt to address?

This paper attempts to address the complexity issues of microbiome data in regression analysis, especially the challenges posed by high - dimensionality, compositionality, zero - inflation and taxonomic hierarchical structure. Specifically: 1. **Special properties of compositional data**: Microbiome data usually exist in the form of relative abundance (composition), and these data are located in a simplex space and are not suitable for standard Euclidean geometry. Therefore, traditional regression methods need to transform the data (such as log - ratio transformation), but these transformation methods are often insufficient or inappropriate when dealing with excessive zero values and taxonomic structures. 2. **Limitations of existing methods**: - **Zero - value handling**: The commonly used log - transformation cannot directly handle zero values. The usual practice is to replace zero values with a small positive number, but this may introduce bias. - **Poor biological interpretability**: It is difficult to intuitively interpret the biological significance of the transformed data. - **Insufficient utilization of taxonomic tree structure**: Existing methods are difficult to effectively combine the taxonomic tree structure for regularization, resulting in possible inconsistencies in the analysis results at different taxonomic levels. 3. **Proposed new framework**: To solve the above problems, the author has developed a new Relative - Shift regression framework, which directly uses compositional data as predictor variables without the need for transformation. This new framework provides a paradigm shift in compositional regression and has better biological interpretability. 4. **Model features**: - **Intercept - free linear regression**: By eliminating the intercept term, the model is fully identifiable on compositional data. - **Direct handling of zero values**: No additional steps are required to handle zero values. - **Feature aggregation**: Feature aggregation is achieved through equal - sparsity and taxonomic - tree - guided regularization methods, thereby reducing the dimension and improving interpretability. 5. **Theoretical contributions**: The author also proposed a unified finite - sample prediction error bound and proved the effectiveness of this method in high - dimensional situations. In summary, this paper aims to provide a new regression analysis method that can better handle the complex characteristics of microbiome data while maintaining good biological interpretability and statistical performance.

It's All Relative: New Regression Paradigm for Microbiome Compositional Data

Regression Analysis for Microbiome Compositional Data

Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

Generalized Linear Models with Linear Constraints for Microbiome Compositional Data

Bayesian compositional regression with microbiome features via variational inference

Bayesian graphical compositional regression for microbiome data

Multivariate Log-Contrast Regression with Sub-Compositional Predictors: Testing the Association Between Preterm Infants' Gut Microbiome and Neurobehavioral Outcomes

A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis

Robust Regression with Compositional Covariates

Bayesian compositional regression with flexible microbiome feature aggregation and selection

High-dimensional count and compositional data analysis in\\ microbiome studies

Negative Binomial factor regression with application to microbiome data analysis

Using anticoagulants.

Conditional regression based on a multivariate zero-inflated logistic normal model for microbiome relative abundance data

Bayesian Mixed Effects Models for Zero-inflated Compositions in Microbiome Data Analysis

Compositional data analysis of the microbiome: fundamentals, tools, and challenges

The life and work of Erik M. P. Widmark.

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

A two-part mixed-effects model for analyzing longitudinal microbiome compositional data

Statistical computation methods for microbiome compositional data network inference

Statistical Methods for Microbiome Compositional Data Network Inference: A Survey