Detecting Shape-Based Interactions Among Environmental Chemicals Using an Ensemble of Exposure-Mixture Regression and Interpretable Machine Learning Tools

Vishal Midya,Chris Gennings
DOI: https://doi.org/10.1007/s12561-023-09405-6
2023-11-23
Statistics in Biosciences
Abstract:Abstract There is growing interest in discovering interactions between multiple environmental chemicals associated with increased adverse health effects. However, most existing approaches (1) either use a projection or product of multiple chemical exposures, which are difficult to interpret and (2) cannot simultaneously handle multi-ordered interactions. Therefore, we develop and validate a method to discover shape-based interactions that mimic usual toxicological interactions. We developed the Multi-ordered explanatory interaction (Moxie) algorithm by merging the efficacy of Extreme Gradient Boosting with the inferential power of Weighted Quantile Sum regression to extract synergistic interactions associated with the outcome/odds of disease in an adverse direction. We evaluated the algorithm’s performance through simulations and compared it with the currently available gold standard, the signed-iterative random forest algorithm. We used the 2017–18 US-NHANES dataset ( n = 447 adults) to evaluate interactions among nine per- and poly-fluoroalkyl substances and five metals measured in whole blood in association with serum low-density lipoprotein cholesterol. In simulations, the Moxie algorithm was highly specific and sensitive and had very low false discovery rates in detecting true synergistic interactions of 2nd, 3rd, and 4th order through moderate ( n = 250) to large ( n = 1000) sample sizes. In NHANES data, we found a two-order synergistic interaction between cadmium and lead detected in people with whole-blood cadmium concentrations and lead above 0.605 ug/dL and 1.485 ug/dL, respectively. Our findings demonstrate a novel validated approach in environmental epidemiology for detecting shape-based toxicologically mimicking interactions by integrating exposure-mixture regression and machine learning methods.
What problem does this paper attempt to address?