MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data

Tianyu Liu,Yuge Wang,Rex Ying,Hongyu Zhao

2023-09-29

Abstract:Discovering genes with similar functions across diverse biomedical contexts poses a significant challenge in gene representation learning due to data heterogeneity. In this study, we resolve this problem by introducing a novel model called Multimodal Similarity Learning Graph Neural Network, which combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data. Leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation, while incorporating regularization with weighted similarity learning and contrastive learning to learn cross-data gene-gene relationships. This novel design ensures that we can offer gene representations containing functional similarity across different contexts in a joint space. Comprehensive benchmarking analysis shows our model's capacity to effectively capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Moreover, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. Therefore, our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.

Machine Learning,Genomics

What problem does this paper attempt to address?

The main problem this paper attempts to address is the discovery of gene functional similarity across different biomedical contexts. Due to data heterogeneity, this issue poses a significant challenge in gene representation learning. Specifically, the authors introduce a novel model—Multimodal Similarity Learning Graph Neural Network (MuSe-GNN), which combines multimodal machine learning and deep graph neural networks to learn gene representations from single-cell sequencing and spatial transcriptomics data. Key contributions of the paper include: 1. Proposing an efficient multi-structure biological data representation learning method. 2. Integrating data from different omics and tissues into a unified space while preserving biological information. 3. Identifying co-located genes with similar functions. 4. Inferring specialized gene causal networks and relationships between genes and biological pathways or diseases. By utilizing 82 training datasets covering 10 tissues, 3 sequencing technologies, and 3 species, MuSe-GNN creates graph structures for model training and gene representation generation, and regularizes through weighted similarity learning and contrastive learning to learn gene-gene relationships across data. Experimental results show that MuSe-GNN effectively captures gene functional similarity across modalities, with performance improvements of up to 97.5% over existing state-of-the-art methods. Additionally, MuSe-GNN has been used to reveal pathway enrichment, regulatory causal networks, and functions of disease-related or dosage-sensitive genes. Therefore, the model can efficiently generate unified gene representations for gene function analysis, tissue function analysis, disease research, and species evolution.

MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data

Molecular Graph Representation Learning via Structural Similarity Information

Multimodal Survival Ensemble Network: Integrating Genomic and Histopathological Insights for Enhanced Cancer Prognosis.

Molecular Representation Learning via Heterogeneous Motif Graph Neural Networks

Graph Neural Networks for Multimodal Single-Cell Data Integration

A Multimodal Graph Neural Network Framework of Cancer Molecular Subtype Classification

AutoGGN: A Gene Graph Network AutoML Tool for Multi-Omics Research

MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN

A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data.

Heterogeneous Graph-Based Multimodal Brain Network Learning

Disentangled similarity graph attention heterogeneous biological memory network for predicting disease-associated miRNAs

A Joint Graphical Model for Inferring Gene Networks Across Multiple Subpopulations and Data Types

Graph Representation Learning on Tissue-Specific Multi-Omics

Explainable Multilayer Graph Neural Network for Cancer Gene Prediction

Self-supervised graph representation learning integrates multiple molecular networks and decodes gene-disease relationships

SMG: self-supervised masked graph learning for cancer gene identification

A multichannel graph neural network based on multisimilarity modality hypergraph contrastive learning for predicting unknown types of cancer biomarkers

A multimodal graph neural network framework for cancer molecular subtype classification

Multi-view Multichannel Attention Graph Convolutional Network for miRNA-disease association prediction

CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection

A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks