Gene-SGAN: discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering

Zhijian Yang,Junhao Wen,Ahmed Abdulkadir,Yuhan Cui,Guray Erus,Elizabeth Mamourian,Randa Melhem,Dhivya Srinivasan,Sindhuja T. Govindarajan,Jiong Chen,Mohamad Habes,Colin L. Masters,Paul Maruff,Jurgen Fripp,Luigi Ferrucci,Marilyn S. Albert,Sterling C. Johnson,John C. Morris,Pamela LaMontagne,Daniel S. Marcus,Tammie L. S. Benzinger,David A. Wolk,Li Shen,Jingxuan Bao,Susan M. Resnick,Haochang Shou,Ilya M. Nasrallah,Christos Davatzikos
DOI: https://doi.org/10.1038/s41467-023-44271-2
IF: 16.6
2024-01-08
Nature Communications
Abstract:Abstract Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN – a multi-view, weakly-supervised deep clustering method – which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer’s disease and brain endophenotypes associated with hypertension, from MRI and single nucleotide polymorphism data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subtyping and endophenotype discovery, and is herein tested on disease-related, genetically-associated neuroimaging phenotypes.
multidisciplinary sciences
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the challenges posed by the heterogeneity of neurological diseases (especially neurodegenerative and neuropsychiatric diseases) in precise diagnosis and treatment. Specifically, many diseases exhibit various different brain phenotypes in different individuals, which may reflect different disease subtypes. However, if these subtypes are not associated with genetic drivers or susceptibility factors, their biological interpretability and therapeutic relevance will be limited. ### Main Methods To tackle this challenge, the authors developed a multi-view, weakly supervised deep clustering method called Gene-SGAN. This method resolves disease heterogeneity by jointly considering phenotypic and genetic data, thereby endowing disease subtypes with genetic relevance and their associated endophenotypic characteristics. The specific steps are as follows: 1. **Data Preparation**: Use brain imaging data and single nucleotide polymorphism (SNP) data from healthy control groups as references to generate brain imaging data for the target population (e.g., patient cohorts). 2. **Generative Adversarial Network (GAN)**: Learn the many-to-one mapping from the reference group to the target group through GAN, capturing the impact of the disease on normal phenotypic features rather than variations influenced by disease-unrelated factors. 3. **Variational Inference (VI)**: Further encourage genetic association through the VI method, estimating the posterior distribution related to genetic features. 4. **Low-Dimensional Latent Space**: Deconstruct phenotypic and genetic heterogeneity in the latent space, breaking it down into latent variables that reflect disease subtypes. 5. **Subtype Identification**: Cluster patients into disease subtypes with relatively homogeneous and genetically related brain phenotypes based on latent variables. ### Application Validation 1. **Semi-Synthetic Experiments**: Validate the generalizability, interpretability, and robustness of Gene-SGAN through semi-synthetic data. 2. **Real Data Application**: Apply Gene-SGAN to identify brain endophenotypes related to Alzheimer's disease and hypertension using a multi-site dataset from 28,858 individuals. ### Main Results 1. **Alzheimer's Disease Subtypes**: - Identified four distinct subtypes (A1, A2, A3, A4), each showing significantly different neuroanatomical patterns, genetic determinants, biological, and clinical biomarkers. - For example, subtype A1 shows relatively preserved regional brain volume; subtype A2 shows focal medial temporal lobe atrophy; subtype A3 shows widespread whole-brain atrophy; subtype A4 shows predominant cortical atrophy with relatively preserved medial temporal lobe. 2. **Hypertension-Related Brain Change Subtypes**: - Identified five distinct subtypes (H1, H2, H3, H4, H5), each showing significantly different neuroanatomical patterns. - For example, subtype H1 shows mild midbrain atrophy; subtype H2 shows severe subcortical gray matter and white matter atrophy; subtype H3 shows larger volumes of deep white matter structures. ### Conclusion Gene-SGAN can effectively resolve disease heterogeneity by jointly considering phenotypic and genetic data, identifying disease subtypes with biological interpretability and clinical significance. This aids in precise diagnosis, patient stratification for clinical trials, and better understanding of heterogeneous neuropathological processes leading to similar clinical symptoms.