Quantifying the advantage of multimodal data fusion for survival prediction in cancer patients

Nikolaos Nikolaou,Domingo Salazar,Harish RaviPrakash,Miguel Gonçalves,Rob Mulla,Nikolay Burlutskiy,Natasha Markuzon,Etai Jacob
DOI: https://doi.org/10.1101/2024.01.08.574756
2024-01-09
Abstract:The last decade has seen an unprecedented advance in technologies at the level of high-throughput molecular assays and image capturing and analysis, as well as clinical phenotyping and digitization of patient data. For decades, genotyping (identification of genomic alterations), the casual anchor in biological processes, has been an essential component in interrogating disease progression and a guiding step in clinical decision making. Indeed, survival rates in patients tested with next-generation sequencing have been found to be significantly higher in those who received a genome-guided therapy than in those who did not. Nevertheless, DNA is only a small part of the complex pathophysiology of cancer development and progression. To assess a more complete picture, researchers have been using data taken from multiple modalities, such as transcripts, proteins, metabolites, and epigenetic factors, that are routinely captured for many patients. Multimodal machine learning offers the potential to leverage information across different bioinformatics modalities to improve predictions of patient outcome. Identifying a multiomics data fusion strategy that clearly demonstrates an improved performance over unimodal approaches is challenging, primarily due to increased dimensionality and other factors, such as small sample sizes and the sparsity and heterogeneity of data. Here we present a flexible pipeline for systematically exploring and comparing multiple multimodal fusion strategies. Using multiple independent data sets from The Cancer Genome Atlas, we developed a late fusion strategy that consistently outperformed unimodal models, clearly demonstrating the advantage of a multimodal fusion model.
Bioinformatics
What problem does this paper attempt to address?