Accurate Identification of Spatial Domain by Incorporating Global Spatial Proximity and Local Expression Proximity

Yuanyuan Yu,Yao He,Zhi Xie
DOI: https://doi.org/10.3390/biom14060674
IF: 6.064
2024-06-10
Biomolecules
Abstract:Accurate identification of spatial domains is essential in the analysis of spatial transcriptomics data in order to elucidate tissue microenvironments and biological functions. However, existing methods only perform domain segmentation based on local or global spatial relationships between spots, resulting in an underutilization of spatial information. To this end, we propose SECE, a deep learning-based method that captures both local and global relationships among spots and aggregates their information using expression similarity and spatial similarity. We benchmarked SECE against eight state-of-the-art methods on six real spatial transcriptomics datasets spanning four different platforms. SECE consistently outperformed other methods in spatial domain identification accuracy. Moreover, SECE produced spatial embeddings that exhibited clearer patterns in low-dimensional visualizations and facilitated a more accurate trajectory inference.
biochemistry & molecular biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to accurately identify spatial domains in spatial transcriptome data?** Specifically, existing methods, when performing spatial domain segmentation, only rely on local or global spatial relationships between points and fail to fully utilize spatial information. Therefore, these methods have limitations in capturing complex spatial structures and patterns. To overcome these problems, the authors propose a new deep - learning method - **SECE (Spatial Embedding with Cell - type - related Expression)**, which combines global spatial proximity and local expression proximity to make more comprehensive use of spatial information. SECE achieves this goal in the following ways: 1. **Extract gene expression features**: Use an auto - encoder (AE) module to extract low - dimensional features from the gene expression matrix, called cell - type - related embeddings (CE). 2. **Combine global and local spatial proximity**: Utilize global physical distance and local expression similarity to learn spatial embeddings (SE). Global proximity is quantified by physical distance, while local proximity is determined based on the similarity of gene expression. 3. **Use graph attention network (GAT)**: Balance local expression similarity and global spatial similarity through the GAT module to obtain more accurate spatial embeddings. 4. **Clustering and downstream analysis**: Identify the spatial domain to which each point belongs by clustering SE, and further use it for downstream analysis such as visualization and trajectory inference. Through this method, SECE can more accurately identify spatial domains and shows better performance than existing methods on multiple real - world spatial transcriptome datasets.