Multi-Omic Graph Diagnosis (MOGDx) : A data integration tool to perform classification tasks for heterogeneous diseases

Barry Ryan,Riccardo E. Marioni,T. Ian Simpson
DOI: https://doi.org/10.1101/2023.07.09.23292410
2024-01-21
Abstract:Heterogeneity in human diseases presents challenges in diagnosis and treatments due to the broad range of manifestations and symptoms. With the rapid development of labelled multi-omic data, integrative machine learning methods have achieved breakthroughs in treatments by redefining these diseases at a more granular level. These approaches often have limitations in scalability, oversimplification, and handling of missing data. In this study, we introduce Multi-Omic Graph Diagnosis (MOGDx), a flexible command line tool for the integration of multi-omic data to perform classification tasks for heterogeneous diseases. MOGDx is a network integrative method that combines patient similarity networks with a reduced vector representation of genomic data. The reduced vector is derived from the shared latent embedding of a multi-modal encoder and the combined network is fed into a graph convolutional network for classification. The multi-modal encoder and graph convolutional network are trained simultaneously making a fully supervised pipeline. MOGDx was evaluated on three datasets from the cancer genome atlas for breast invasive carcinoma, kidney cancer, and low grade glioma. MOGDx demonstrated state-of-the-art performance and an ability to identify relevant multi-omic markers in each task. It did so while integrating more genomic measures with greater patient coverage compared to other network integrative methods. MOGDx is available to download from . Overall, MOGDx is a promising tool for integrating multi-omic data, classifying heterogeneous diseases, and interpreting genomic markers.
Genetic and Genomic Medicine
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the diagnostic and treatment challenges posed by heterogeneity in human diseases. Specifically, the study attempts to provide a more accurate classification method for heterogeneous diseases by integrating multi - omic data. The following are the key issues mentioned in the paper: 1. **Complexity of disease heterogeneity**: - Many diseases have a wide range of phenotypes and symptoms, which makes it difficult for traditional diagnostic and treatment methods to respond effectively. - Redefining these diseases with more refined subtypes or symptom gradings can reveal new treatment methods, reuse old drugs, or identify intervention strategies. 2. **Limitations of existing methods**: - Existing multi - omic data integration methods have limitations in terms of scalability, over - simplification, and handling missing data. - These methods are usually unable to fully utilize the information between various omic data, resulting in limited classification performance. 3. **The need for multi - omic data integration**: - With the development of high - throughput sequencing technologies, various types of biological data (such as genomes, transcriptomes, proteomes, etc.) are becoming more and more accessible. - Integrating these different types of omic data can capture more biological features, thereby improving classification accuracy. ### MOGDx's solutions To address the above problems, the authors proposed **Multi - Omic Graph Diagnosis (MOGDx)**, a flexible tool for multi - omic data integration and heterogeneous disease classification. The main features of MOGDx include: - **Network integration method**: MOGDx combines the Patient Similarity Network (PSN) with the reduced - dimensional vector representation of genomic data. - **Graph Convolutional Neural Network (GCN)**: Use GCN to perform classification tasks on the integrated network. - **Multi - Modal Encoder (MME)**: Use MME to perform supervised dimensionality reduction on each modality, ensuring that each modality learns the same latent representation. - **Fully supervised training**: GCN and MME are trained simultaneously to form a fully supervised pipeline. ### Experimental verification MOGDx was evaluated on three Cancer Genome Atlas (TCGA) datasets, namely breast invasive carcinoma (BRCA), low - grade glioma (LGG), and kidney cancer (KIPAN). The results show that MOGDx achieved state - of - the - art performance on these datasets and was able to identify multi - omic markers related to specific biomedical problems. ### Summary MOGDx provides a novel method for integrating multi - omic data, classifying heterogeneous diseases, and interpreting genomic markers. It not only improves classification accuracy but also demonstrates flexibility in handling missing data and an arbitrary number of data modalities. This provides strong support for personalized medicine. --- If you have more specific questions or need further information, please feel free to let me know!