Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors

Joseph Antonelli,Maitreyi Mazumdar,David Bellinger,David C. Christiani,Robert Wright,Brent A. Coull
DOI: https://doi.org/10.48550/arXiv.1711.11239
2019-10-30
Abstract:Humans are routinely exposed to mixtures of chemical and other environmental factors, making the quantification of health effects associated with environmental mixtures a critical goal for establishing environmental policy sufficiently protective of human health. The quantification of the effects of exposure to an environmental mixture poses several statistical challenges. It is often the case that exposure to multiple pollutants interact with each other to affect an outcome. Further, the exposure-response relationship between an outcome and some exposures, such as some metals, can exhibit complex, nonlinear forms, since some exposures can be beneficial and detrimental at different ranges of exposure. To estimate the health effects of complex mixtures we propose a flexible Bayesian approach that allows exposures to interact with each other and have nonlinear relationships with the outcome. We induce sparsity using multivariate spike and slab priors to determine which exposures are associated with the outcome, and which exposures interact with each other. The proposed approach is interpretable, as we can use the posterior probabilities of inclusion into the model to identify pollutants that interact with each other. We illustrate our approach's ability to estimate complex functions using simulated data, and apply our method to two studies to determine which environmental pollutants adversely affect health.
Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to quantify the health effects of these complex mixtures in the study of the health impacts of environmental mixture exposures. Specifically, the research faces the following challenges: 1. **Interactions among multiple pollutants**: When exposed to multiple pollutants, there may be interactions among them, and these interactions may affect health outcomes. 2. **Non - linear exposure - response relationships**: Some pollutants (such as certain metals) may have different impacts on health at different exposure levels, and this relationship may be complex and non - linear. 3. **Variable selection in high - dimensional data**: When dealing with a large number of potential exposure factors, a method is needed to determine which exposure factors are related to health outcomes and which exposure factors interact with each other. To address these challenges, the authors propose a flexible Bayesian semi - parametric regression method. This method allows for interactions among exposure factors and can handle non - linear exposure - response relationships. In addition, by using multivariate spike and slab priors, this method can identify exposure factors related to health outcomes and their interactions. ### Specific problems and solutions - **Problem 1: Interactions among multiple pollutants** - **Solution**: By introducing multivariate spike and slab priors, the model can automatically select which exposure factors interact with each other and estimate the strength of these interactions. - **Problem 2: Non - linear exposure - response relationships** - **Solution**: Use natural spline basis functions to represent the main effects and interaction effects of each exposure factor, thereby capturing non - linear exposure - response relationships. - **Problem 3: Variable selection in high - dimensional data** - **Solution**: By using spike and slab priors, the model can automatically select important exposure factors, reduce the dimension of the model, and improve the interpretability of the model. ### Application examples The authors applied this method to the study of children's neurodevelopment in Bangladesh and explored the impacts of arsenic, manganese, and lead exposures on children's motor development. The study found that there is a non - linear interaction between arsenic and manganese, which is of great significance for understanding the impacts of environmental mixtures on health. ### Mathematical formulas - **Model form**: \[ Y_i\sim \text{Normal}(f(X_i)+C_i\beta_c,\sigma^2) \] where \(f(X_i)\) represents the effects of exposure factors, \(C_i\) is the covariate vector, \(\beta_c\) is the regression coefficient of covariates, and \(\sigma^2\) is the residual variance. - **Decomposition of exposure effects**: \[ f(X_i)=\sum_{h = 1}^k f^{(h)}(X_i) \] \[ f^{(h)}(X_i)=\sum_{j_1 = 1}^p\tilde{X}_{ij_1}\beta_j^{(h)}+\sum_{j_1 = 2}\sum_{j_2 < j_1}\tilde{X}_{ij_1j_2}\beta_{j_1j_2}^{(h)}+\cdots \] - **Prior distributions**: \[ P(\beta_S^{(h)}|\zeta)=(1-\prod_{j\in S}\zeta_{jh})\delta_0+(\prod_{j\in S}\zeta_{jh})\psi_1(\beta_S^{(h)}) \] \[ P(\zeta_{jh})=\tau_h\zeta_{jh}(1 - \tau_h)^{1-\zeta_{jh}}1(A_h\not\subset A_m\fora