Graph attention-based fusion of pathology images and gene expression for prediction of cancer survival

Yi Zheng,Regan D. Conrad,Emily J. Green,Eric J. Burks,Margrit Betke,Jennifer E. Beane,Vijaya B. Kolachalama
DOI: https://doi.org/10.1101/2023.10.26.564236
2024-01-23
Abstract:Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological in-sights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architecture that integrates a graph representation of pathology images with gene expression data and concomitantly learns from the fused information to predict patient-specific survival. In our approach, pathology images are represented as undirected graphs, and their embeddings are combined with embeddings of gene expression signatures using an attention mechanism to stratify tumors by patient survival. We show that our framework improves the survival prediction of human non-small cell lung cancers, out-performing existing state-of-the-art approaches that lever-age multimodal data. Our framework can facilitate spatial molecular profiling to identify tumor heterogeneity using pathology images and gene expression data, complementing results obtained from more expensive spatial transcriptomic and proteomic technologies.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: by combining pathological images and gene expression data, develop a multimodal fusion framework based on the graph attention mechanism to more accurately predict the survival of cancer patients. Specifically, the author aims to solve the problem that most existing multimodal data fusion frameworks fail to fully consider the interaction between different modalities. ### Specific description of the problem 1. **Limitations of multimodal data fusion**: - When dealing with pathological images and gene expression data, existing multimodal data fusion methods often fail to fully capture the complex interaction relationship between them. - Most methods fail to fully utilize the spatial information of pathological images and the biological significance of gene expression data. 2. **Clinical needs**: - In clinical practice, being able to accurately predict the patient's survival time is crucial for formulating personalized treatment plans. - Combining pathological images and gene expression data can provide more comprehensive patient information, thereby improving the accuracy of survival prediction. 3. **Technical challenges**: - Pathological images (such as hematoxylin - eosin - stained whole - slide images, WSI) and gene expression data come from different modalities. How to effectively fuse them together is a technical problem. - It is necessary to develop new model architectures to handle these heterogeneous data and ensure that the model can learn meaningful feature representations. ### Solutions proposed in the paper The author proposes a multimodal fusion framework based on the graph attention mechanism. The main contributions include: 1. **Graph representation and embedding**: - Represent the pathological image as an undirected graph, where nodes represent local image patches and edges represent adjacent relationships. - Use a convolutional neural network (CNN) to extract feature vectors of image patches and use them as node features of the graph. 2. **Attention mechanism**: - Introduce a graph attention layer (GAT) so that each node can pay attention to its neighbor nodes, thereby better learning local and global features. - Use the attention mechanism to fuse the embedding of pathological images with the embedding of gene expression data to generate a joint representation. 3. **Survival prediction model**: - Develop two survival prediction models: an imaging survival model (ISM) that only uses pathological images and a fusion survival model (FSM) that combines pathological images and gene expression data. - The FSM model finally predicts the patient's survival risk through the graph attention module and the global attention pooling layer. 4. **Interpretability tools**: - Introduce a survival activation map (SAM) to visualize the regions in the pathological image that are highly correlated with survival prediction, helping to understand the decision - making process of the model. ### Experimental results - Experiments show that this framework has achieved state - of - the - art performance in predicting the survival time of human non - small - cell lung cancer (NSCLC) patients. - The model not only improves the prediction accuracy but also provides an interpretable survival activation map, which is helpful for further studying the spatial molecular features in the tumor microenvironment. Through these methods, the author has successfully solved the key problems in multimodal data fusion and provided a new and effective tool for cancer survival prediction.