Enhanced Feature Selection for Microbiome Data Using FLORAL: Scalable Log-ratio Lasso Regression.

Teng Fei,Tyler Funnell,Nicholas R. Waters,Sandeep S. Raj,Mirae Baichoo,Keimya Sadeghi,Anqi Dai,Oriana Miltiadous,Roni Shouval,Meng Lv,Jonathan U. Peled,Doris M. Ponce,Miguel-Angel Perales,Mithat Gönen,Marcel R.M. van den Brink
DOI: https://doi.org/10.1101/2023.05.02.538599
2024-01-01
Cell Reports Methods
Abstract:Identifying predictive biomarkers of patient outcomes from high-throughput microbiome data is of high interest, while existing computational methods do not satisfactorily account for complex survival endpoints, longitudinal samples, and taxa-specific sequencing biases. We present FLORAL, an open-source tool to perform scalable log-ratio lasso regression and microbial feature selection for continuous, binary, time-to-event, and competing risk outcomes, with compatibility for longitudinal microbiome data as time-dependent covariates. The proposed method adapts the augmented Lagrangian algorithm for a zero-sum constraint optimization problem while enabling a two-stage screening process for enhanced false-positive control. In extensive simulation and real-data analyses, FLORAL achieved consistently better false-positive control compared to other lasso-based approaches and better sensitivity over popular differential abundance testing methods for datasets with smaller sample sizes. In a survival analysis of allogeneic hematopoietic cell transplant recipients, FLORAL demonstrated considerable improvement in microbial feature selection by utilizing longitudinal microbiome data over solely using baseline microbiome data.
What problem does this paper attempt to address?