Sainsc: a computational tool for segmentation-free analysis of in-situ capture
Niklas Müller-Bötticher,Sebastian Tiesmeyer,Roland Eils,Naveed Ishaque
DOI: https://doi.org/10.1101/2024.08.02.603879
2024-08-05
Abstract:Spatially resolved transcriptomics has become the method of choice to characterise the complexity of biomedical tissue samples. Until recently, scientists have been restricted to profiling methods with high spatial resolution but for a limited set of genes or methods that can profile transcriptome-wide but at low spatial resolution. Through recent developments, there are now methods which offer subcellular spatial resolution and full transcriptome coverage. However, utilizing the high spatial and gene resolution of these new methods remains elusive due to several factors including low detection efficiency, high computational cost and difficulties in delineating cell borders. Here we present Sainsc (Segmentation-free analysis of in-situ capture data), which combines a cell-segmentation free approach with efficient data processing of transcriptome-wide nanometer resolution spatial data. Sainsc can generate cell-type maps with accurate cell-type assignment at a subcellular level, together with corresponding maps of the assignment scores that facilitate the interpretation in the local confidence of cell-type assignment. We demonstrate its utility and accuracy across different tissues and profiling methods. Compared to other methods, Sainsc requires lower computational resources and has scalable performance, enabling interactive data exploration. Sainsc is compatible with common data analysis frameworks and is available as open-source software in multiple programming languages.
Bioinformatics
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are several key challenges in high - resolution spatial transcriptomics data processing and analysis. Specifically, these problems include:
1. **Low detection efficiency**: Although high - resolution spatial transcriptomics methods can provide gene expression information at the sub - cellular level, their detection efficiency is low, resulting in sparse data.
2. **High computational cost**: Processing and analyzing these high - resolution data requires a large amount of computational resources, which limits their wide application.
3. **Difficult to divide cell boundaries**: At the sub - cellular level, accurately dividing cell boundaries is a difficult problem, which affects the correct assignment of gene expression.
To address these challenges, the author has developed a computational tool named **Sainsc**. The main features and advantages of Sainsc include:
- **Segmentation - free method**: Sainsc adopts a cell - segmentation - free method and models gene expression through Kernel Density Estimation (KDE), thereby reducing data sparsity and being suitable for classification tasks.
- **Efficient data processing**: Sainsc combines the advantages of Python and Rust programming languages, utilizes Rust's high - performance data structures and multi - threading support, and achieves efficient computational performance.
- **Compatible with multiple data formats**: Sainsc supports common file formats (such as GEM files) and can output to community - standard data structures (such as AnnData and SpatialData formats), ensuring interoperability with other spatial and single - cell analysis tools.
- **Interactive data exploration**: Sainsc provides a wealth of convenience functions and plotting capabilities, making data exploration more convenient.
Through these methods, Sainsc can generate high - precision cell - type maps and provide corresponding confidence scores, thus showing excellent performance in different tissues and methods. Compared with other methods, Sainsc requires fewer computational resources, has scalable performance, and is suitable for the analysis of large - scale data.