Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min,Zhiceng Shi,Jun Zhang,Jun Wan,Changmiao Wang
2024-07-11
Abstract:In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H\&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose \textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of \textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at <a class="link-external link-https" href="https://github.com/shizhiceng/mclSTExp" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing,Artificial Intelligence,Computer Vision and Pattern Recognition,Quantitative Methods
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to predict the spatial gene expression profile from the conventional H&E - stained tissue section images using deep - learning techniques. Specifically, the paper proposes a new multimodal contrastive learning method (mclSTExp), aiming to overcome the problem that existing methods fail to fully utilize the multimodal information provided by H&E images and ST data with spatial locations. By integrating this information, mclSTExp aims to improve the accuracy of predicting gene expression levels, thus providing a cost - effective alternative to the high - cost spatial transcriptomics techniques. In addition, this model also shows its potential in interpreting over - expressed genes in specific cancers, clarifying immune - related genes, and identifying specific spatial regions annotated by pathologists.