STAIG: Spatial Transcriptomics Analysis via Image-Aided Graph Contrastive Learning for Domain Exploration and Alignment-Free Integration

Yitao Yang,Yang Cui,Xin Zeng,Yubo Zhang,Martin Loza,Sung-Joon Park,Kenta Nakai
DOI: https://doi.org/10.1101/2023.12.18.572279
2024-01-02
Abstract:Spatial transcriptomics is an essential application for investigating cellular structures and interactions and requires multimodal information to precisely study spatial domains. Here, we propose STAIG, a novel deep-learning model that integrates gene expression, spatial coordinates, and histological images using graph-contrastive learning coupled with high-performance feature extraction. STAIG can integrate tissue slices without prealignment and remove batch effects. Moreover, it was designed to accept data acquired from various platforms, with or without histological images. By performing extensive benchmarks, we demonstrated the capability of STAIG to recognize spatial regions with high precision and uncover new insights into tumor microenvironments, highlighting its promising potential in deciphering spatial biological intricates.
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to more accurately identify and integrate spatial domains in different tissue sections in the analysis of Spatial Transcriptomics (ST) data, and overcome the limitations of existing methods. Specifically, the STAIG model aims to: 1. **Integrate multi - modal information**: Combine gene expression, spatial coordinates and histological images (such as H&E - stained images) to study the spatial structure more comprehensively. Existing methods usually only rely on gene expression data or partially combine spatial information, resulting in an inability to fully reflect the actual spatial patterns. 2. **Multi - slice integration without pre - alignment**: STAIG can effectively integrate data from multiple tissue sections of different platforms without the need for manual alignment, and reduce batch effects. Traditional multi - slice integration methods usually require pre - alignment of slice coordinates, which increases complexity and uncertainty in practice. 3. **Improve the quality of feature extraction**: Extract high - quality features from H&E - stained images through self - supervised learning models (such as Bootstrap Your Own Latent, BYOL), avoiding feature distortion due to inconsistent staining. This enables STAIG to better capture the subtle differences in tissue structure. 4. **Improve the precision of spatial domain identification**: Using Graph Contrastive Learning and Graph Neural Network (GNN), STAIG can generate more informative embedding representations, thereby more accurately identifying spatial domains. Experimental results show that STAIG exhibits higher Adjusted Rand Index (ARI) and Silhouette Coefficient (SC) on a variety of ST datasets, especially in the analysis of brain regions and the tumor microenvironment. 5. **Reveal new biological insights**: STAIG not only improves the accuracy of spatial domain identification, but also can reveal some previously difficult - to - identify areas, such as areas enriched with Cancer - Associated Fibroblasts (CAFs) in the tumor microenvironment, and the tumor - adjacent tissue junction in zebrafish melanoma. These findings are helpful for in - depth understanding of complex biological systems and disease mechanisms. In summary, STAIG significantly enhances the analysis ability of spatial transcriptomics data by integrating multi - modal data and advanced deep - learning techniques, providing new tools and perspectives for biomedical research.