STANCE: a unified statistical model to detect cell-type-specific spatially variable genes in spatial transcriptomics

Haohao Su,Yuesong Wu,Bin Chen,Yuehua Cui
DOI: https://doi.org/10.1101/2024.09.22.614385
2024-09-24
Abstract:A significant challenge in analyzing spatial transcriptomics data is the effective and efficient detection of spatially variable genes (SVGs), whose expression exhibits non-random spatial patterns in tissues. Many SVGs show spatial variation in expression that is highly correlated with cell type categories or compositions, leading to the concept of cell type-specific spatially variable genes (ctSVGs). Existing statistical methods for detecting ctSVGs treat cell type-specific spatial effects as fixed effects when modeling, resulting in a critical issue: the testing results are not invariant to the rotation of spatial coordinates. Additionally, an SVG may display random spatial patterns within a cell type, and a ctSVG may exhibit random spatial patterns from a general perspective, indicating that an SVG does not necessarily have to be a ctSVG, and vice versa. This poses challenges in real analysis when detecting SVGs or ctSVGs. To address these problems, we propose STANCE, a unified statistical model developed to detect both SVG and ctSVGs in spatial transcriptomics. By integrating gene expression, spatial location, and cell type composition through a linear mixed-effect model, STANCE enables the identification of both SVGs and ctSVGs in an initial stage, followed by a second stage test dedicated to ctSVG detection. Its design ensures robustness in complex scenarios and the results are spatial rotation invariant. We demonstrated the performance of STANCE through comprehensive simulations and analyses of three public datasets. The downstream analyses based on ctSVGs detected by STANCE suggest promising future applications of the model in spatial transcriptomics and various areas of genome biology. A software implementation of STANCE is available at https://github.com/Cui-STT-Lab/STANCE.
Bioinformatics
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key issues existing in the current methods when detecting cell - type - specific spatial variant genes (ctSVGs) in spatial transcriptomics data: 1. **Rotation invariance problem**: The existing statistical methods treat cell - type - specific spatial effects as fixed effects during the modeling process, resulting in detection results that are not rotation - invariant with respect to spatial coordinates. This means that if the tissue sample is observed at different angles, the detection results may vary greatly, leading to a high false - positive or false - negative rate. This problem is particularly prominent in actual analysis because tissue sections are often randomly positioned during sample preparation, and different directions will give different spatial coordinates. 2. **Complex relationship between SVG and ctSVG**: There is a complex inter - relationship between spatial variant genes (SVGs) and cell - type - specific spatial variant genes (ctSVGs). An SVG may show a random spatial pattern within any cell type, and a ctSVG may also show a random spatial pattern as a whole. This indicates that an SVG is not necessarily a ctSVG, and vice versa. This complex relationship poses challenges for SVG or ctSVG detection in actual analysis. To solve these problems, the paper proposes **STANCE** (Spatial Transcriptomics ANalysis of genes with Cell - type - specific Expression), which is a unified statistical model aimed at simultaneously detecting SVGs and ctSVGs in spatial transcriptomics data. By integrating gene expression, spatial location and cell - type composition information and using a linear mixed - effects model, STANCE can identify SVGs and ctSVGs in the initial stage, and then specifically detect ctSVGs in the second stage. Its design ensures robustness in complex scenarios, and the results are invariant to rotation and transformation of spatial coordinates.