A Machine Learning Model for Disease Risk Prediction by Integrating Genetic and Non-Genetic Factors

Yu Xu,Chonghao Wang,Zeming Li,Yunpeng Cai,Ouzhou Young,Aiping Lyu,Lu Zhang
DOI: https://doi.org/10.1109/bibm55620.2022.9994925
2022-01-01
Abstract:Polygenic risk score (PRS) has been widely used to identify the high-risk individuals from the general population, which would be helpful for disease prevention and early treatment. Many methods have been developed to calculate PRS by weighting and aggregating the phenotype-associated risk alleles from genome-wide association studies. However, only considering genetic effects may not be sufficient for risk prediction because the disease risk is not only related to genetic factors but also non-genetic factors, e.g., diet, physical exercise et al. But it is still a challenge to integrate these genetic and non-genetic factors into a unified machine learning framework for disease risk prediction. In this paper, we proposed PRSIMD (PRS Integrating Multi-source Data), a machine learning model that applies posterior regularization to integrate genetic and non-genetic factors to improve disease risk prediction. Also, we applied Mendelian Randomization analysis to identify the causal non-genetic risk factors for the selected diseases. We applied PRSIMD to predict type 2 diabetes and coronary artery disease from UK Biobank and observed that PRSIMD was significantly better than the existing methods to calculate PRS. In addition, we observed that PRSIMD achieved the better predictive power than the composite risk score. The codes of PRSIMD are available at: https://github.con ericcombiolab/PRSIMD
What problem does this paper attempt to address?