Hypothesis-driven mediation analysis for compositional data: an application to gut microbiome

Noora Kartiosuo,Jaakko Nevalainen,Olli Raitakari,Katja Pahkala,Kari Auranen
DOI: https://doi.org/10.1080/24709360.2024.2360375
2024-06-08
Biostatistics & Epidemiology
Abstract:Sequencing read-count data often exhibit sparsity (zero-count inflation) and overdispersion. As most sequencing techniques provide an arbitrary total count, taxon-specific counts should be treated under the compositional data-analytic framework. There is increasing interest in the role of gut microbiome composition in mediating the effects of exposures on health. Previous compositional mediation approaches have focused on identifying mediating taxa among a number of candidates. We here consider compositional causal mediation when a priori knowledge is available about the hierarchy for a restricted number of taxa, building on a single hypothesis structured as contrasts between appropriate sub-compositions. Based on the assumed causal graph and the theory of multiple contemporaneous mediators, we define non-parametric estimands for overall and coordinate-wise mediation effects and show how they are estimated based on parametric linear models. The mediators have straightforward and coherent interpretations, related to causal questions about interrelationships between the sub-compositions. We perform a simulation study focusing on the impact of sparsity on estimation. While unbiased, the estimators' precision depends on sparsity and the relative magnitudes of exposure-to-mediator and mediator-to-outcome effects in a complex manner. In the empirical application we find an inverse association of fibre intake on insulin level, mainly attributable to the direct effects.
What problem does this paper attempt to address?