Multimodal predictions of end stage chronic kidney disease from asymptomatic individuals for discovery of genomic biomarkers

Simona Rabinovici-Cohen,Daniel E Platt,Toshiya Iwamori,Itai Guez,SANJOY DEY,ARITRA BOSE,MICHIHARU KUDO,Laura Cosmai,Camillo Porta,Akira Koseki,pablo Meyer
DOI: https://doi.org/10.1101/2024.10.15.24315251
2024-10-16
Abstract:Chronic kidney disease (CKD) is a complex condition where the kid- neys are damaged and progressively lose their ability to filter blood, 10% of the world population have the disease that often goes undetected un- til it is too late for intervention. Using the UK Biobank (UKBB) we constructed a CKD cohort of patients (n=46,986) with genomic, clinical and demographic data available, a subset (n=2,151) having also whole body Magnetic Resonance Imaging (MRI) scans. We used this multi- modal cohort to successfully predict, from initially healthy patients, their 5-year outcomes for End-Stage Renal Disease (ESRD, n=210, AUC=0.804 +/- 0.03 with 5 fold cross-validation) and the larger cohort for validation to predict time-to ESRD and perform Genome-wide association studies (GWAS). Extracting important clinical, phenotypic and genetic features from the models, we were able to stratify the cohorts based on a novel set of significant previously unreported SNPs related to mitochondria/cell death, kidney development and function. In particular, we show that the risk allele of SNP rs1383063 present in 30% of the population irrespec- tive of ancestry and putatively regulating MAGI-1, a gene expressed in the podocyte slit diaphragm, is a strong predictor of ESRD and stratifies male populations of older age.
Genetic and Genomic Medicine
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the early prediction of the progression from chronic kidney disease (CKD) to end - stage renal disease (ESRD). Specifically, using data from the UK Biobank, the researchers constructed a cohort of CKD patients (n = 46,986) that included genomic, clinical, and demographic data, and a subset of these patients (n = 2,151) also underwent whole - body magnetic resonance imaging (MRI) scans. Through multimodal data (including demographic, clinical, genomic, and imaging data), the researchers successfully predicted the probability of developing ESRD within 5 years in initially healthy people (n = 210, AUC = 0.804 ± 0.03, using 5 - fold cross - validation). In addition, the researchers also used a large - scale cohort to predict the time to ESRD and conducted a genome - wide association study (GWAS). ### Main Objectives: 1. **Early Prediction of ESRD**: Predict whether early - stage CKD patients will develop into ESRD within 5 years through multimodal data, especially by combining genomic, clinical, and imaging data. 2. **Discover New Biomarkers**: Extract important clinical, phenotypic, and genetic features to identify novel significant single - nucleotide polymorphisms (SNPs) related to mitochondria/cell death, kidney development, and function. 3. **Gene - Imaging Association**: Explore the relationship between genetic variations and imaging features, especially in terms of kidney microstructure and function. ### Research Background: - Chronic kidney disease (CKD) is a complex disease, affecting approximately 10% of the global population, and is often detected at an advanced stage. - Early detection and intervention are crucial for slowing down the progression of the disease, but currently, there is a lack of effective early prediction methods. - The integration of genomics, imaging, and clinical data can provide a more comprehensive perspective and is helpful for discovering new biomarkers and prediction models. ### Methods: - **Data Sources**: Use data from the UK Biobank to construct a CKD cohort of 46,986 patients, among which 2,151 patients have MRI data. - **Multimodal Models**: Combine demographic, clinical, genomic, and imaging data to construct multiple prediction models, including logistic regression, random forest, and XGBoost. - **Feature Extraction**: Extract SNP features from genomic data, radiomics features from MRI data, and analyze them using convolutional neural networks (CNN) and vision transformers (ViT). - **GWAS Analysis**: Conduct a genome - wide association study on a large - scale cohort to discover gene variations related to ESRD. ### Results: - **Prediction Performance**: The multimodal model performs well in the 5 - year ESRD prediction task, with an AUC reaching 0.804 ± 0.03. - **Important Features**: Age, gender, kidney volume, and imaging heterogeneity are important features for predicting ESRD. - **Newly Discovered SNPs**: New SNPs related to mitochondria/cell death, kidney development, and function have been discovered, especially rs1383063. This SNP exists in 30% of the population and is related to the expression of the MAGI - 1 gene. MAGI - 1 is expressed in the podocyte slit diaphragm and is related to proteinuria. ### Conclusions: This study successfully predicted the progression from early - stage CKD to ESRD through multimodal data and discovered new gene variations related to kidney function and structure. These findings are helpful for early identification of high - risk patients and provide new ideas for the prevention and treatment of chronic kidney disease.