Abstract:Advances in spatial transcriptomics (ST) technologies have provided unprecedented opportunities to depict transcriptomic and histological landscapes in the spatial context. Multi-modal ST data provide abundant and comprehensive information about cellular status, function, and organization. However, in dealing with the processing and analysis of spatial transcriptomics data, existing algorithms struggle to effectively fuse the multi-modal information contained within ST data. Here, we propose a graph contrastive learning-based cross-modality fusion model named stGCL for accurate and robust integrating gene expression, spatial information as well as histological profiles simultaneously. stGCL adopts a novel histology-based Vision Transformer (H-ViT) method to effectively encode histological features and combines multi-modal graph attention auto-encoder (GATE) with contrastive learning to fuse cross-modality features. In addition, stGCL introduces a pioneering spatial coordinate correcting and registering strategy for tissue slices integration, which can reduce batch effects and identify cross-sectional domains precisely. Compared with state-of-the-art methods on spatial transcriptomics data across platforms and resolutions, stGCL achieves a superior clustering performance and is more robust in unraveling spatial patterns of biological significance. Additionally, stGCL successfully reconstructed three-dimensional (3D) brain tissue structures by integrating vertical and horizontal slices respectively. Application of stGCL in human bronchiolar adenoma (BA) data reveals intratumor spatial heterogeneity and identifies candidate gene biomarkers. In summary, stGCL enables the fusion of various spatial modality data and is a powerful tool for analytical tasks such as spatial domain identification and multi-slice integration. ### Competing Interest Statement The authors have declared no competing interest.

ST-Align: A Multimodal Foundation Model for Image-Gene Alignment in Spatial Transcriptomics

STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping

STAIG: Spatial Transcriptomics Analysis via Image-Aided Graph Contrastive Learning for Domain Exploration and Alignment-Free Integration

RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency

Alignment and Integration of Spatial Transcriptomics Data

stGCL: A versatile cross-modality fusion method based on multi-modal graph contrastive learning for spatial transcriptomics

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

Efficient integration of multiple spatial transcriptomics data for 3D domain detection, matching, and alignment with stMSA

StereoMM: A Graph Fusion Model for Integrating Spatial Transcriptomic Data and Pathological Images

Multi-modal Spatial Clustering for Spatial Transcriptomics Utilizing High-resolution Histology Images

Statistical batch-aware embedded integration, dimension reduction and alignment for spatial transcriptomics

spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics

Enhancing Spatial Transcriptomics Analysis by Integrating Image-Aware Deep Learning Methods

TIST: Transcriptome and Histopathological Image Integrative Analysis for Spatial Transcriptomics

SPACEL: deep learning-based characterization of spatial transcriptome architectures

Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope

Spatial Transcriptomics-Aided Localization for Single-Cell Transcriptomics with STALocator

SPACEL: Characterizing Spatial Transcriptome Architectures by Deep-Learning

ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

stFormer: a foundation model for spatial transcriptomics

Multimodal contrastive learning for spatial gene expression prediction using histology images