SpatialcoGCN: deconvolution and spatial information–aware simulation of spatial transcriptomics data via deep graph co-embedding

Wang Yin,You Wan,Yuan Zhou
DOI: https://doi.org/10.1093/bib/bbae130
IF: 9.5
2024-03-27
Briefings in Bioinformatics
Abstract:Abstract Spatial transcriptomics (ST) data have emerged as a pivotal approach to comprehending the function and interplay of cells within intricate tissues. Nonetheless, analyses of ST data are restricted by the low spatial resolution and limited number of ribonucleic acid transcripts that can be detected with several popular ST techniques. In this study, we propose that both of the above issues can be significantly improved by introducing a deep graph co-embedding framework. First, we establish a self-supervised, co-graph convolution network–based deep learning model termed SpatialcoGCN, which leverages single-cell data to deconvolve the cell mixtures in spatial data. Evaluations of SpatialcoGCN on a series of simulated ST data and real ST datasets from human ductal carcinoma in situ, developing human heart and mouse brain suggest that SpatialcoGCN could outperform other state-of-the-art cell type deconvolution methods in estimating per-spot cell composition. Moreover, with competitive accuracy, SpatialcoGCN could also recover the spatial distribution of transcripts that are not detected by raw ST data. With a similar co-embedding framework, we further established a spatial information–aware ST data simulation method, SpatialcoGCN-Sim. SpatialcoGCN-Sim could generate simulated ST data with high similarity to real datasets. Together, our approaches provide efficient tools for studying the spatial organization of heterogeneous cells within complex tissues.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper is mainly dedicated to solving two key problems in the analysis of spatial transcriptomics (ST) data: 1. **Low spatial resolution**: Currently, most ST techniques have a low spatial resolution, resulting in each sampling point potentially covering multiple cells rather than a distinct single cell. This makes it difficult to accurately analyze cell types and their distribution in tissues. 2. **Limited number of detected RNA transcripts**: Existing ST techniques are also limited in the number of RNA transcripts they can detect, especially those based on in - situ hybridization and fluorescence microscopy, which further affects the comprehensive understanding of gene expression patterns. To solve these problems, the authors propose a method based on a deep graph co - embedding framework - **SpatialcoGCN**. Specifically, this method improves ST data analysis in the following ways: - **Cell mixture deconvolution**: Use single - cell RNA sequencing (scRNA - seq) data to deconvolve cell mixtures in ST data to estimate the cell composition at each sampling point. - **Recovery of the spatial distribution of undetected transcripts**: Recover the spatial distribution of undetected transcripts in ST data through competitive accuracy. In addition, the authors also developed a new spatial - information - aware ST data simulation method - **SpatialcoGCN - Sim**, which can generate simulated ST data highly similar to real - world datasets, thereby better simulating actual spatial gene expression patterns and achieving more realistic model training and evaluation. ### Overview of specific methods 1. **SpatialcoGCN**: - Input: scRNA - seq data and ST data from the same tissue type. - Method: Project scRNA - seq data and ST data into a common low - dimensional embedding space via variational auto - encoder (VAE), and use graph convolutional network (GCN) to learn the mapping matrix, thereby estimating the cell composition at each sampling point and the spatial distribution of undetected transcripts. 2. **SpatialcoGCN - Sim**: - Objective: Generate simulated ST data with high similarity. - Method: Use reference ST data to obtain the spatial coordinates of measurement positions, and identify the nearest neighbor spots of each cell in the low - dimensional space through the KNN algorithm, predict the two - dimensional coordinates of the cell, and finally use a hexagonal grid to blur the spatial expression pattern to simulate low - spatial - resolution spots. Through these methods, the authors aim to provide effective tools to study the spatial organization of heterogeneous cells in complex tissues, thereby gaining a deeper understanding of the molecular mechanisms in physiological and complex disease processes.