Using Genomic Context Informed Genotype Data and Within‐model Ancestry Adjustment to Classify Type 2 Diabetes

Eric J Barnett,Yanli Zhang-James,Jonathan Hess,Stephen J Glatt,Stephen V Faraone
DOI: https://doi.org/10.1101/2024.09.12.24313579
2024-09-13
Abstract:Despite high heritability estimates, complex genetic disorders have proven difficult to predict with genetic data. Genomic research has documented polygenic inheritance, cross-disorder genetic correlations, and enrichment of risk by functional genomic annotation, but the vast potential of that combined knowledge has not yet been leveraged to build optimal risk models. Additional methods are likely required to progress genetic risk models of complex genetic disorders towards clinical utility. We developed a framework that uses annotations providing genomic context alongside genotype data as input to convolutional neural networks to predict disorder risk. We validated models in a matched-pairs type 2 diabetes dataset. A neural network using genotype data (AUC: 0.66) and a convolutional neural network using context-informed genotype data (AUC: 0.65) both significantly outperformed polygenic risk score approaches in classifying type-2 diabetes. Adversarial ancestry tasks eliminated the predictability of ancestry without changing model performance.
Genetic and Genomic Medicine
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Improving the accuracy of genetic risk prediction for type 2 diabetes (T2D)**: Although type 2 diabetes has a high genetic susceptibility, existing genetic data perform poorly in predicting disease risk. The paper proposes a new method that utilizes genomic contextual information and genotype data to improve the risk prediction of T2D. 2. **Addressing the impact of ancestral background on genetic risk models**: Ancestral background may affect the performance of genetic risk models, especially when there are systematic differences between different ancestral groups. The paper introduces adversarial tasks to eliminate the influence of ancestral information, thereby enhancing the generalization ability of the model. Specifically, the authors developed a framework based on convolutional neural networks (CNN) that combines genomic contextual information with genotype data to predict the risk of T2D. Additionally, they adjusted the ancestral information in the model through adversarial tasks to prevent the model from overly relying on ancestral background for classification. Experimental results show that this new method significantly outperforms traditional polygenic risk scoring methods in classifying T2D.