Accommodating Missingness in Environmental Measurements in Gene‐environment Interaction Analysis

Mengyun Wu,Yangguang Zang,Sanguo Zhang,Jian Huang,Shuangge Ma
DOI: https://doi.org/10.1002/gepi.22055
2017-01-01
Genetic Epidemiology
Abstract:For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice byproduct is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the main effects, interactions hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.
What problem does this paper attempt to address?