Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

Zeyu Zhang,Yuanshen Zhao,Jingxian Duan,Yaou Liu,Hairong Zheng,Dong Liang,Zhenyu Zhang,Zhi-Cheng Li
2024-04-11
Abstract:The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histopathology and transcriptomics remains challenging. In this paper, we propose Pathology-Genome Heterogeneous Graph (PGHG) that integrates whole slide images (WSI) and bulk RNA-Seq expression data with heterogeneous graph neural network for cancer survival analysis. The PGHG consists of biological knowledge-guided representation learning network and pathology-genome heterogeneous graph. The representation learning network utilizes the biological prior knowledge of intra-modal and inter-modal data associations to guide the feature extraction. The node features of each modality are updated through attention-based graph learning strategy. Unimodal features and bi-modal fused features are extracted via attention pooling module and then used for survival prediction. We evaluate the model on low-grade gliomas, glioblastoma, and kidney renal papillary cell carcinoma datasets from the Cancer Genome Atlas (TCGA) and the First Affiliated Hospital of Zhengzhou University (FAHZU). Extensive experimental results demonstrate that the proposed method outperforms both unimodal and other multi-modal fusion models. For demonstrating the model interpretability, we also visualize the attention heatmap of pathological images and utilize integrated gradient algorithm to identify important tissue structure, biological pathways and key genes.
Quantitative Methods,Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address the effective integration of multimodal clinical data (including histopathological images and genomic data) in cancer diagnosis and prognosis to improve the accuracy of survival prediction and uncover potential associations between histopathology and transcriptomics. Specifically, the researchers proposed a method called "Pathology-Genome Heterogeneous Graph" (PGHG), which integrates Whole Slide Images (WSI) and bulk RNA sequencing expression data for cancer survival analysis. PGHG consists of two key components: 1. **Representation Learning Network Guided by Biological Knowledge**: This component uses biological prior knowledge to guide the extraction of features from histopathological images and genomic data. This includes using pathway enrichment analysis to obtain biological pathways as nodes in the genomic subgraph and segmenting whole slide images into non-overlapping patches as nodes in the pathology subgraph. Additionally, RNA sequence reconstruction and Gene Set Variation Analysis (GSVA) supervised pathology feature extraction are employed to ensure consistency and complementarity of cross-modal information. 2. **Pathology-Genome Heterogeneous Graph**: This component constructs a heterogeneous graph structure containing pathology and genomic subgraphs, where the node features of each modality are updated through an attention mechanism-based graph learning strategy. In this way, the model can extract unimodal features and bimodal fusion features for survival prediction tasks. Experimental results show that the proposed method performs well on datasets such as low-grade glioma, glioblastoma, and papillary renal cell carcinoma. It can visualize attention heatmaps of pathological images, identifying important tissue structures, biological pathways, and key genes, thereby enhancing the model's interpretability. In summary, this study aims to develop a novel multimodal fusion strategy that combines the strengths of digital pathology and genomics data to improve the accuracy and interpretability of cancer survival analysis.