Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

Xiaofei Wang,Xingxu Huang,Stephen J. Price,Chao Li
2024-05-27
Abstract:The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address the issue of low resolution in Spatial Transcriptomics (ST). Specifically, although existing ST technologies can provide spatial distribution information of gene expression within tissues, their resolution is relatively low, limiting the ability to conduct in-depth studies of gene expression. To solve this problem, the authors propose a Cross-modal Conditional Diffusion Model, namely Diff-ST, to enhance the resolution of ST maps. The main objectives include: 1. **Integrating multi-modal information**: By combining histology images and gene expression data, utilizing complementary information to enhance the resolution of ST maps. 2. **Addressing issues in existing methods**: Some existing super-resolution methods have problems such as restoration uncertainty and mode collapse, and this method aims to overcome these challenges. 3. **Proposing a new network architecture**: A multi-modal disentangling network and cross-modal adaptive modulation strategy are designed, along with a co-expression intensity-based gene-correlation graph network, to better model the relationships between multiple genes. 4. **Experimental validation**: Extensive experiments were conducted on three public datasets, demonstrating that this method outperforms existing state-of-the-art methods in the ST super-resolution task.