Abstract:Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological in-sights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architecture that integrates a graph representation of pathology images with gene expression data and concomitantly learns from the fused information to predict patient-specific survival. In our approach, pathology images are represented as undirected graphs, and their embeddings are combined with embeddings of gene expression signatures using an attention mechanism to stratify tumors by patient survival. We show that our framework improves the survival prediction of human non-small cell lung cancers, out-performing existing state-of-the-art approaches that lever-age multimodal data. Our framework can facilitate spatial molecular profiling to identify tumor heterogeneity using pathology images and gene expression data, complementing results obtained from more expensive spatial transcriptomic and proteomic technologies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: by combining pathological images and gene expression data, develop a multimodal fusion framework based on the graph attention mechanism to more accurately predict the survival of cancer patients. Specifically, the author aims to solve the problem that most existing multimodal data fusion frameworks fail to fully consider the interaction between different modalities. ### Specific description of the problem 1. **Limitations of multimodal data fusion**: - When dealing with pathological images and gene expression data, existing multimodal data fusion methods often fail to fully capture the complex interaction relationship between them. - Most methods fail to fully utilize the spatial information of pathological images and the biological significance of gene expression data. 2. **Clinical needs**: - In clinical practice, being able to accurately predict the patient's survival time is crucial for formulating personalized treatment plans. - Combining pathological images and gene expression data can provide more comprehensive patient information, thereby improving the accuracy of survival prediction. 3. **Technical challenges**: - Pathological images (such as hematoxylin - eosin - stained whole - slide images, WSI) and gene expression data come from different modalities. How to effectively fuse them together is a technical problem. - It is necessary to develop new model architectures to handle these heterogeneous data and ensure that the model can learn meaningful feature representations. ### Solutions proposed in the paper The author proposes a multimodal fusion framework based on the graph attention mechanism. The main contributions include: 1. **Graph representation and embedding**: - Represent the pathological image as an undirected graph, where nodes represent local image patches and edges represent adjacent relationships. - Use a convolutional neural network (CNN) to extract feature vectors of image patches and use them as node features of the graph. 2. **Attention mechanism**: - Introduce a graph attention layer (GAT) so that each node can pay attention to its neighbor nodes, thereby better learning local and global features. - Use the attention mechanism to fuse the embedding of pathological images with the embedding of gene expression data to generate a joint representation. 3. **Survival prediction model**: - Develop two survival prediction models: an imaging survival model (ISM) that only uses pathological images and a fusion survival model (FSM) that combines pathological images and gene expression data. - The FSM model finally predicts the patient's survival risk through the graph attention module and the global attention pooling layer. 4. **Interpretability tools**: - Introduce a survival activation map (SAM) to visualize the regions in the pathological image that are highly correlated with survival prediction, helping to understand the decision - making process of the model. ### Experimental results - Experiments show that this framework has achieved state - of - the - art performance in predicting the survival time of human non - small - cell lung cancer (NSCLC) patients. - The model not only improves the prediction accuracy but also provides an interpretable survival activation map, which is helpful for further studying the spatial molecular features in the tumor microenvironment. Through these methods, the author has successfully solved the key problems in multimodal data fusion and provided a new and effective tool for cancer survival prediction.

Graph attention-based fusion of pathology images and gene expression for prediction of cancer survival

Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis

Multimodal Survival Ensemble Network: Integrating Genomic and Histopathological Insights for Enhanced Cancer Prognosis.

Cross-modality Attention-based Multimodal Fusion for Non-small Cell Lung Cancer (NSCLC) Patient Survival Prediction

Quantifying the advantage of multimodal data fusion for survival prediction in cancer patients

A Multimodal Affinity Fusion Network for Predicting the Survival of Breast Cancer Patients

A Multi-modal Fusion Framework Based on Multi-task Correlation Learning for Cancer Prognosis Prediction

Predicting the Survival of Cancer Patients With Multimodal Graph Neural Network

Multi-modal Fusion Network with Intra- and Inter-Modality Attention for Prognosis Prediction in Breast Cancer

MIF: Multi-Shot Interactive Fusion Model for Cancer Survival Prediction Using Pathological Image and Genomic Data

Hierarchical multimodal fusion framework based on noisy label learning and attention mechanism for cancer classification with pathology and genomic features

SG-Fusion: A swin-transformer and graph convolution-based multi-modal deep neural network for glioma prognosis

MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis

Transformer-Based Multimodal Fusion for Survival Prediction by Integrating Whole Slide Images, Clinical, and Genomic Data

Abstract 2313: Multi-modal deep learning to predict cancer outcomes by integrating radiology and pathology images

An Innovative and Efficient Diagnostic Prediction Flow for Head and Neck Cancer: A Deep Learning Approach for Multi-Modal Survival Analysis Prediction Based on Text and Multi-Center PET/CT Images

Survival Prediction for Non-Small Cell Lung Cancer Based on Multimodal Fusion and Deep Learning

BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes