PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics

Yuchen Liang,Guowei Shi,Runlin Cai,Yuchen Yuan,Ziying Xie,Long Yu,Yingjian Huang,Qian Shi,Lizhe Wang,Jun Li,Zhonghui Tang
DOI: https://doi.org/10.1038/s41467-024-44835-w
IF: 16.6
2024-01-18
Nature Communications
Abstract:Abstract Computational methods have been proposed to leverage spatially resolved transcriptomic data, pinpointing genes with spatial expression patterns and delineating tissue domains. However, existing approaches fall short in uniformly quantifying spatially variable genes (SVGs). Moreover, from a methodological viewpoint, while SVGs are naturally associated with depicting spatial domains, they are technically dissociated in most methods. Here, we present a framework (PROST) for the quantitative recognition of spatial transcriptomic patterns, consisting of (i) quantitatively characterizing spatial variations in gene expression patterns through the PROST Index; and (ii) unsupervised clustering of spatial domains via a self-attention mechanism. We demonstrate that PROST performs superior SVG identification and domain segmentation with various spatial resolutions, from multicellular to cellular levels. Importantly, PROST Index can be applied to prioritize spatial expression variations, facilitating the exploration of biological insights. Together, our study provides a flexible and robust framework for analyzing diverse spatial transcriptomic data.
multidisciplinary sciences
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the deficiencies of existing methods in quantifying spatially variable genes (SVGs) and detecting spatial domains. Specifically: 1. **Quantifying Spatially Variable Genes (SVGs)**: - Existing methods lack a unified standard when quantifying SVGs, making it difficult to compare different methods. - Although statistical model methods can identify genes with spatial expression patterns, these methods usually rely on statistical significance and it is difficult to interpret the spatial heterogeneity and homogeneity of gene expression. - Although deep - learning methods can identify SVGs, they lack interpretability. 2. **Detecting Spatial Domains**: - When detecting spatial domains with consistent gene expression, due to high - dimensionality and sparsity, traditional non - spatial methods (such as k - means and Louvain algorithms) cannot effectively identify spatially consistent biological domains. - Although some clustering methods specifically designed for spatial transcriptome data (such as stLearn, SpaGCN, etc.) perform well at multi - cell resolution, at cell resolution, especially in complex tissues, there are still challenges, such as dynamic transcriptional heterogeneity and measurement noise. To solve these problems, the paper proposes a framework named PROST (Pattern Recognition Of Spatial Transcriptomics), which includes two modules: PROST Index (PI) and PROST Neural Network (PNN). - **PROST Index (PI)**: - Quantify the spatial pattern of gene expression through a new, assumption - free index (PI score). - The PI score consists of two components: Significance and Separability, which are used to measure the spatial homogeneity and heterogeneity of gene expression respectively. - **PROST Neural Network (PNN)**: - Utilize neighborhood - based graphs and self - attention mechanisms to integrate spatial and transcriptional information and achieve unsupervised tissue segmentation. - Adaptively learn spatial dependencies by optimizing neural network parameters and denoising low - dimensional embeddings, thereby improving the accuracy of tissue segmentation. The paper verifies the superior performance of PROST in SVG identification and spatial domain detection through experiments on multiple spatial transcriptome datasets, especially more prominent at cell resolution.