Graspot: A graph attention network for spatial transcriptomics data integration with optimal transport

Zizhan Gao,Kai Cao,Lin Wan
DOI: https://doi.org/10.1101/2024.02.01.578505
2024-02-05
Abstract:Spatial transcriptomics (ST) technologies enable the measurement of mRNA expression while simultaneously capturing spot locations. By integrating ST data, the 3D structure of a tissue can be reconstructed, yielding a comprehensive understanding of the tissue’s intricacies. Nevertheless, a computational challenge persists: how to remove batch effects while preserving genuine biological structure variations across ST data. To address this, we introduce Graspot, a aph ttention network designed for atial transcriptomics data integration with unbalanced ptimal ransport. Graspot adeptly harnesses both gene expression and spatial information to align common structures across multiple ST datasets. It embeds multiple ST datasets into a unified latent space, facilitating the partial alignment of spots from different slices. Demonstrating superior performance compared to existing methods on four real spatial transcriptomics datasets, Graspot excels in ST data integration, including tasks that require partial alignment. In particular, Graspot unveils subtle tumor microenvironment structures of breast cancer, and accurately aligns the spatio-temporal transcriptomics data to reconstruct human heart developmental processes. The code for Graspot is available at .
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to remove batch effects when integrating Spatial Transcriptomics (ST) data while retaining the true biological structural changes between different ST datasets. Specifically, the paper introduces a method named Graspot, which is a technique that combines Graph Attention Network (GAT) and Unbalanced Optimal Transport (UOT), aiming to efficiently use gene expression and spatial information to align multiple ST datasets and embed them into a unified low - dimensional space, thereby achieving superior performance in partial alignment and global alignment tasks. ### Main Research Questions 1. **Removing Batch Effects**: When integrating spatial transcriptomics data from different batches, how to effectively remove batch effects caused by technical differences or changes in experimental conditions. 2. **Retaining Biological Structural Changes**: While removing batch effects, how to retain the true biological structural changes between different samples to ensure that the biological significance of the data is not destroyed. 3. **Partial Alignment and Global Alignment**: How to perform effective alignment between partially overlapping datasets and how to perform global alignment among multiple datasets to construct a unified low - dimensional space. ### Solutions Graspot solves the above problems through the following methods: - **Graph Attention Network (GAT)**: Used to capture information on gene expression and spatial location and generate a low - dimensional embedding representation for each dataset. - **Unbalanced Optimal Transport (UOT)**: Used to align different datasets in the embedding space, allowing partial alignment and being able to handle the imbalance problems existing in the datasets. - **Iterative Optimization**: Generate an integrated embedding representation and probability alignment results through iterative optimization of the GAT and UOT modules. ### Experimental Verification The paper verifies the effectiveness of Graspot through the following aspects of experiments: - **Global Alignment**: On the spatial transcriptomics data of the human dorsolateral prefrontal cortex (DLPFC), Graspot exhibits the highest alignment accuracy in the global alignment tasks of multiple slice pairs. - **Partial Alignment**: On partially overlapping slice pairs, Graspot also performs excellently, especially when dealing with unbalanced datasets, its performance is better than existing methods. - **Multi - slice Integration**: Graspot can effectively integrate multiple spatial transcriptomics slices, generate a unified low - dimensional space, show clear clustering results, and be consistent with the manually annotated hierarchical structure. ### Conclusion By combining Graph Attention Network and Unbalanced Optimal Transport, Graspot successfully solves the key problems in the integration of spatial transcriptomics data, namely removing batch effects and retaining biological structural changes. The performance of this method on multiple actual datasets proves its superior performance in global alignment and partial alignment tasks, providing a powerful tool for the study of complex biological processes.