Semi-supervised Cooperative Learning for Multiomics Data Fusion

Daisy Yi Ding,Xiaotao Shen,Michael Snyder,Robert Tibshirani

2023-08-03

Abstract:Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and offers a systematic framework for leveraging the shared underlying relationships across omics to strengthen signals. However, the challenge of acquiring large-scale labeled data remains, and there are cases where multiomics data are available but in the absence of annotated labels. To harness the potential of unlabeled multiomcis data, we introduce semi-supervised cooperative learning. By utilizing an "agreement penalty", our method incorporates the additional unlabeled data in the learning process and achieves consistently superior predictive performance on simulated data and a real multiomics study of aging. It offers an effective solution to multiomics data fusion in settings with both labeled and unlabeled data and maximizes the utility of available data resources, with the potential of significantly improving predictive models for diagnostics and therapeutics in an increasingly multiomics world.

Quantitative Methods,Genomics,Applications

What problem does this paper attempt to address?

The paper primarily addresses the issue of multiomics data fusion and proposes a new method—semi-supervised cooperative learning—to better utilize unlabeled data to enhance prediction performance in the context of limited labeled data. Specifically, the paper addresses the following key issues: 1. **Challenges of Multiomics Data Fusion**: With advancements in biotechnology, various types of "omics" data (such as genomics, transcriptomics, proteomics, etc.) can be obtained. These data provide the possibility of understanding biological systems from different perspectives. Integrating these multi-source data for analysis helps improve the accuracy of predicting disease phenotypes and treatment response outcomes. 2. **Limitations of Existing Fusion Methods**: Common multiomics data fusion methods include early fusion and late fusion, but they do not fully utilize the shared relationships between different data modalities and lack a systematic framework to enhance signal consistency. 3. **Cooperative Learning Methods**: Recently proposed cooperative learning methods introduce a "consistency penalty" term, encouraging the prediction results between different data modalities to converge, thereby enhancing prediction performance. 4. **Scarcity of Labeled Data**: In biomedical research, obtaining large-scale labeled data is very difficult and time-consuming. Therefore, effectively utilizing data without corresponding labels becomes particularly important. 5. **Semi-Supervised Cooperative Learning Method**: To address the above issues, the paper proposes a semi-supervised cooperative learning method that combines the "consistency penalty" concept from cooperative learning, effectively utilizing unlabeled data to further improve prediction accuracy. This method not only considers the prediction consistency among labeled data but also leverages the potential consistency among unlabeled data, thereby maximizing the use of all available data resources. In summary, the main contribution of this paper is the proposal of a new semi-supervised learning framework that, under conditions of limited labeled data, improves the prediction performance of multiomics data fusion by fully utilizing unlabeled data. This has significant implications for biomedical research, particularly in the fields of diagnosis and treatment.

Semi-supervised Cooperative Learning for Multiomics Data Fusion

Common-Individual Semantic Fusion for Multi-View Multi-Label Learning

Cooperative learning for multiview analysis

Supervised Multi-Modal Fission Learning

A Multi-modal Fusion Framework Based on Multi-task Correlation Learning for Cancer Prognosis Prediction

A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks

TCGM: an Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis

Multimodal Learning for Multi-Omics: A Survey

Semi-supervised information fusion for medical image analysis: Recent progress and future perspectives

Semi-supervised meta-learning elucidates understudied molecular interactions

scFusionTTT: Single-cell transcriptomics and proteomics fusion with Test-Time Training layers

Deep learning-based approaches for multi-omics data integration and analysis

Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis

Semi-supervised multi-label collective classification ensemble for functional genomics

Improved Multimodal Fusion for Small Datasets with Auxiliary Supervision

Machine learning for multi-omics data integration in cancer

Deep Learning Based Multimodal Biomedical Data Fusion: an Overview and Comparative Review

Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration

MULTI-MODAL DATA FUSION SCHEMES FOR INTEGRATED CLASSIFICATION OF IMAGING AND NON-IMAGING BIOMEDICAL DATA

MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views