Abstract:Abstract Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods. Author summary Accurate disease risk prediction is an essential step towards precision medicine. Deep learning models have achieved the state-of-the-art performance for many prediction tasks. However, they generally suffer from the curse of dimensionality and lack of biological interpretability, both of which have greatly limited their applications to the prediction analysis of whole-genome sequencing data. We present here an explainable deep transfer learning model for the analysis of high-dimensional genomic data. Our proposed method can detect predictive genes that harbor genetic variants with both linear and non-linear effects via the proposed group-wise feature importance score. It can also efficiently and accurately model disease risk based on the detected predictive genes using the proposed transfer-learning based network architecture. Our proposed method is built at the gene level, and thus is much more biologically interpretable. It is also computationally efficiently and can be applied to whole-genome sequencing data that have millions of potential predictors. Through both simulation studies and the analysis of whole-genome sequencing data obtained from the Alzheimer’s Disease Neuroimaging Initiative, we have demonstrated that our method can efficiently detect predictive genes and it has better prediction performance than many existing methods.

Deep Learning-Based Polygenic Risk Analysis for Alzheimer’s Disease Prediction

Deep learning-based polygenic risk analysis for Alzheimer's disease prediction

Deep learning methods improve polygenic risk analysis and prediction for Alzheimer’s disease

Deep learning for polygenic score analysis for Alzheimer's disease risk prediction in the Chinese population.

Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Alzheimer’s disease risk prediction using automated machine learning

Deep Post-Gwas Analysis Identifies Potential Risk Genes and Risk Variants for Alzheimer’s Disease, Providing New Insights into Its Disease Mechanisms

Polygenic Hazard Score Associated Multimodal Brain Networks along the Alzheimer's Disease Continuum.

Prediction of Alzheimer's disease using multi-variants from a Chinese genome-wide association study

Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction

Identifying Genes Associated with Alzheimer's Disease Using Gene-Based Polygenic Risk Score

Deep Neural Network Classifier for Alzheimer’s Disease

DeepRisk: A deep learning approach for genome-wide assessment of common disease risk

Classifying Alzheimer's disease and normal subjects using machine learning techniques and genetic-environmental features

Epistatic Features and Machine Learning Improve Alzheimer's Disease Risk Prediction Over Polygenic Risk Scores

Prediction of Alzheimer’s disease-specific phospholipase c gamma-1 SNV by deep learning-based approach for high-throughput screening

A simulative deep learning model of SNP interactions on chromosome 19 for predicting Alzheimer's disease risk and rates of disease progression

Comparative analysis of machine learning algorithms for Alzheimer's disease classification using EEG signals and genetic information

Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data

Intelligent Alzheimer's Diseases Gene Association Prediction Model Using Deep Regulatory Genomic Neural Networks

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data