Randomized Spatial PCA (RASP): a computationally efficient method for dimensionality reduction of high-resolution spatial transcriptomics data

Ian K. Gingerich,Brittany A. Goods,H. Robert Frost
DOI: https://doi.org/10.1101/2024.12.20.629785
2024-12-22
Abstract:Spatial transcriptomics (ST) provides critical insights into the complex spatial organization of gene expression in tissues, enabling researchers to unravel the intricate relationship between cellular environments and biological function. Identifying spatial domains within tissues is essential for understanding tissue architecture and the mechanisms underlying various biological processes, including development and disease progression. Here, we present Randomized Spatial PCA (RASP), a novel spatially aware dimensionality reduction method for spatial transcriptomics (ST) data. RASP is designed to be orders-of-magnitude faster than existing techniques, scale to ST data with hundreds-of-thousands of locations, support the flexible integration of non-transcriptomic covariates, and enable the reconstruction of de-noised and spatially smoothed expression values for individual genes. To achieve these goals, RASP uses a randomized two-stage principal component analysis (PCA) framework that leverages sparse matrix operations and configurable spatial smoothing. We compared the performance of RASP against five alternative methods (BASS, GraphST, SEDR, spatialPCA, and STAGATE) on four publicly available ST datasets generated using diverse techniques and resolutions (10x Visium, Stereo-Seq, MERFISH, and 10x Xenium) on human and mouse tissues. Our results demonstrate that RASP achieves tissue domain detection performance comparable or superior to existing methods with a several orders-of-magnitude improvement in computational speed. The efficiency of RASP enhances the analysis of complex ST data by facilitating the exploration of increasingly high-resolution subcellular ST datasets that are being generated.
Bioinformatics
What problem does this paper attempt to address?