Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Guillaume Jaume,Anurag Vaidya,Richard Chen,Drew Williamson,Paul Liang,Faisal Mahmood
2024-04-15
Abstract:Integrating whole-slide images (WSIs) and bulk transcriptomics for predicting patient survival can improve our understanding of patient prognosis. However, this multimodal task is particularly challenging due to the different nature of these data: WSIs represent a very high-dimensional spatial description of a tumor, while bulk transcriptomics represent a global description of gene expression levels within that tumor. In this context, our work aims to address two key challenges: (1) how can we tokenize transcriptomics in a semantically meaningful and interpretable way?, and (2) how can we capture dense multimodal interactions between these two modalities? Specifically, we propose to learn biological pathway tokens from transcriptomics that can encode specific cellular functions. Together with histology patch tokens that encode the different morphological patterns in the WSI, we argue that they form appropriate reasoning units for downstream interpretability analyses. We propose fusing both modalities using a memory-efficient multimodal Transformer that can model interactions between pathway and histology patch tokens. Our proposed model, SURVPATH, achieves state-of-the-art performance when evaluated against both unimodal and multimodal baselines on five datasets from The Cancer Genome Atlas. Our interpretability framework identifies key multimodal prognostic factors, and, as such, can provide valuable insights into the interaction between genotype and phenotype, enabling a deeper understanding of the underlying biological mechanisms at play. We make our code public at: <a class="link-external link-https" href="https://github.com/ajv012/SurvPath" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Genomics,Quantitative Methods,Tissues and Organs
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two key challenges in the multimodal task of predicting patient survival time using whole - slide images (WSIs) and bulk transcriptomics data: 1. **How to tokenize transcriptomics data in a semantically meaningful and interpretable way?** - Transcriptomics data is already naturally represented as feature vectors, but directly splicing it with data of other modalities will limit multimodal learning to late - fusion operations. The paper proposes a tokenization method based on biological pathways, which groups genes according to known biological pathways to generate biological pathway tokens (Pathway Tokens) with specific cellular functions. This method not only provides a more fine - grained representation but also enhances the interpretability of the model. 2. **How to capture the dense multimodal interactions between these two modalities?** - Early - fusion methods can capture pairwise similarities between all tokens through Transformer models, but due to the high - dimensionality of WSIs and the complexity of transcriptomics data, such models face huge challenges in computation and memory. The paper introduces a new unified and memory - efficient attention mechanism, which effectively models the interactions between patch tokens and pathway tokens by designing shared parameters for queries, keys, and values and simplifying the attention layer to ignore the interactions between patch tokens. ### Model overview The model proposed in the paper is called **SURVPATH**, and its main contributions include: 1. **Transcriptomics tokenizer**: Generate biological pathway tokens using existing cell biology knowledge. 2. **SURVPATH model**: A memory - efficient and resolution - independent multimodal Transformer model for integrating transcriptomics and patch tokens to predict patient survival. 3. **Multi - level interpretability framework**: Enable users to understand prediction results from unimodal and cross - modal perspectives. 4. **Experimental verification**: A series of experiments and ablation studies were carried out using five datasets from The Cancer Genome Atlas (TCGA), demonstrating the predictive ability of SURVPATH and benchmarking it against unimodal and multimodal fusion methods. ### Method overview 1. **Pathway tokenizer**: - **Composing pathways**: Select appropriate inference units, such as biological pathways, which are composed of a set of genes or sub - pathways involved in specific biological processes. - **Encoding pathways**: Given a set of transcriptomics measurements \( g\in\mathbb{R}^{N_G} \) containing \( N_G \) genes, construct pathway - level tokens \( X(P)\in\mathbb{R}^{N_P\times d} \), where \( d \) represents the dimension of the tokens. Learn the weights \( \phi_i \) of each pathway through a multi - layer perceptron (MLP), that is, \( x(P)_i=\phi_i(g_{P_i}) \), where \( g_{P_i} \) is the set of genes in pathway \( P_i \). 2. **Histological patch tokenizer**: - Given an input WSI, extract low - dimensional patch embeddings to define patch tokens. First, identify tissue regions, and then decompose them into non - overlapping patches. Each patch is mapped to a patch embedding \( x(H)_i = f(h_i) \) by a pre - trained feature extractor \( f(\cdot) \). Finally, transform the patch embeddings into patch tokens \( X(H)\in\mathbb{R}^{N_H\times d} \) that match the token dimension \( d \) through a learnable linear transformation. 3. **Multimodal fusion**: - Design an early - fusion mechanism to capture the dense multimodal interactions between pathway tokens and patch tokens through the Transformer attention mechanism. Specifically, splice the pathway and patch tokens into a sequence \( X\in\mathbb{R}^{(N_P + N_H)\times d} \) of \((N_H + N_P) \) tokens, and extract queries through three linear projections.