Zero is not absence: censoring-based differential abundance analysis for microbiome data

Lap Sum Chan,Gen Li
DOI: https://doi.org/10.1093/bioinformatics/btae071
IF: 5.8
2024-02-01
Bioinformatics
Abstract:Abstract Motivation Microbiome data analysis faces the challenge of sparsity, with many entries recorded as zeros. In differential abundance analysis, the presence of excessive zeros in data violates distributional assumptions and creates ties, leading to an increased risk of type I errors and reduced statistical power. Results We developed a novel normalization method, called censoring-based analysis of microbiome proportions (CAMP), for microbiome data by treating zeros as censored observations, transforming raw read counts into tie-free time-to-event-like data. This enables the use of survival analysis techniques, like the Cox proportional hazards model, for differential abundance analysis. Extensive simulations demonstrate that CAMP achieves proper type I error control and high power. Applying CAMP to a human gut microbiome dataset, we identify 60 new differentially abundant taxa across geographic locations, showcasing its usefulness. CAMP overcomes sparsity challenges, enabling improved statistical analysis and providing valuable insights into microbiome data in various contexts. Availability and implementation The R package is available at https://github.com/lapsumchan/CAMP.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?