CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data

Chen Zhao,Anqi Liu,Xiao Zhang,Xuewei Cao,Zhengming Ding,Qiuying Sha,Hui Shen,Hong-Wen Deng,Weihua Zhou

2023-04-12

Abstract:Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.

Machine Learning,Artificial Intelligence,Genomics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the issue of dealing with incomplete data in multi - omics data integration. Specifically, due to technical limitations or cost factors, some samples may lack one or more types of omics data (such as genomics, transcriptomics, proteomics, etc.), which poses a challenge to the integration of multi - omics data. Traditional multi - omics data integration methods often rely on simple feature concatenation or raw data concatenation, and these methods are not effective in dealing with incomplete multi - omics data, limiting their potential applications in disease diagnosis and phenotypic research. To overcome this challenge, the paper proposes a new deep - learning method - Cross - omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). The main contributions of CLCLSA are as follows: 1. **Utilize complete multi - omics data as supervision** to learn feature representations between different types of biological data through cross - omics auto - encoders. Through cross - omics embedding, CLCLSA can reconstruct incomplete multi - omics data and calculate modality - specific representations. 2. **Use multi - omics contrastive learning** to maximize the mutual information between different omics layers and enhance the consistency between different omics data. 3. **Introduce feature - level self - attention and omics - level self - attention** to dynamically select the most informative features for multi - omics data integration. In this way, CLCLSA can not only recover the missing omics data but also improve the overall performance of the model. The paper verifies the effectiveness of CLCLSA in dealing with incomplete multi - omics data through extensive experiments on four publicly available multi - omics datasets. The experimental results show that CLCLSA outperforms the existing state - of - the - art methods in multi - omics data classification tasks, whether in the case of complete data or incomplete data.

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data

Self-Supervised Contrastive Learning on Attribute and Topology Graphs for Predicting Relationships Among lncRNAs, miRNAs and Diseases

Strategic Multi-Omics Data Integration via Multi-Level Feature Contrasting and Matching

Few-Shot MS and PAN Joint Classification with Improved Cross-Source Contrastive Learning

Deep multi-view contrastive learning for cancer subtype identification

A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks

Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination

Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data

Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

scMODAL: A general deep learning framework for comprehensive single-cell multi-omics data alignment with feature links

Linking Representations with Multimodal Contrastive Learning

Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification

Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data

Single-cell multi-omics integration for unpaired data by a siamese network with graph-based contrastive loss

Contrastively generative self-expression model for single-cell and spatial multimodal data

Multi-omics Single-Cell Data Integration and Regulatory Inference with Graph-Linked Embedding

Deep latent space fusion for adaptive representation of heterogeneous multi-omics data

Deep learning-based approaches for multi-omics data integration and analysis

Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis