Mapping lineage-resolved scRNA-seq data with spatial transcriptomics using TemSOMap

Xinhai Pan,Alejandro Danies-Lopez,Xiuwei Zhang
DOI: https://doi.org/10.1101/2024.10.31.621331
2024-11-03
Abstract:Spatial transcriptomics (ST) has become a powerful technique that advances the study of cell spatial organization and cell-cell interactions. While ST can preserve location information of cells or spots, limitations of such technologies include lower number of genes, and lower resolution compared to scRNA-seq datasets. These limitations can be alleviated by integrating scRNA-seq data with the ST data. By mapping the single cells onto the spatial data, we can infer the spatial coordinates of the cells from the scRNA-seq dataset. We consider leveraging temporal information in this challenging task of spatial location inference. During tissue formation, cells divided from the same ancestor are likely to be located close to each other in the tissue, thus the cell clonal or lineage information can improve cell location inference. CRISPR/Cas9-based lineage tracing technologies have enabled paired sequencing of cells' gene expression and lineage barcodes. The lineage barcodes can be used to reconstruct the cell lineage tree, which represents cells' clonal relationships. In order to incorporate this information, we developed TemSOMap (Temporal dynamics guided Spatial Omics Mapping), which infers the spatial coordinates of cells by mapping a paired gene expression and lineage barcode dataset onto a spatial transcriptomics dataset. TemSOMap utilizes a machine learning framework to infer a cell-to-spot mapping matrix by minimizing a loss function based on expression and lineage. We show that TemSOMap more accurately infers the spatial location of single cells compared to state-of-the-art baseline methods under various scenarios, using both simulated and real datasets. The resulting lineage-resolved ST data can help us better understand the spatio-temporal dynamics of cells in a tissue. TemSOMap is publicly available at https://github.com/ZhangLabGT/TemSOMap.
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to infer the spatial positions of single cells by integrating single - cell RNA sequencing (scRNA - seq) data with spatial transcriptomics (ST) data. Specifically, the authors developed a method named TemSOMap, which utilizes time - dynamics - guided spatial omics mapping to improve the prediction accuracy of single - cell spatial positions by combining gene expression and cell lineage information. This method is especially suitable for "low - resolution" point - based ST data, such as the data generated by 10x Visium technology. ### Main problems of the paper 1. **Prediction of single - cell spatial positions**: - Although spatial transcriptomics (ST) technology can preserve the position information of cells or points, its number of genes and resolution are usually lower than those of scRNA - seq data. Therefore, by mapping scRNA - seq data onto ST data, the spatial coordinates of single cells in scRNA - seq data can be inferred. - Traditional gene - expression - based methods face challenges in predicting single - cell spatial positions because there are large batch effects between ST and scRNA - seq data, and the potential spatial - position search space is large with few shared features. 2. **Utilization of time - dynamic information**: - Cells divide through multiple generations during tissue formation, which is a series of time - dependent processes. Cells derived from the same ancestral cell tend to be close to each other spatially, unless they randomly migrate to distant places. - The authors use paired gene - expression and lineage - barcode data obtained from CRISPR/Cas9 lineage - tracing technology to reconstruct cell - lineage trees, representing the clonal relationships of cells. These lineage information can be used to improve the prediction of single - cell spatial positions. ### Solutions - **TemSOMap method**: - **Input**: scRNA - seq data matrix (including single - cell lineage barcodes) and ST data matrix (including gene expression and spatial coordinates of points). - **Output**: A cell - point mapping matrix \( M \), representing the probability that each cell is mapped to each point. - **Objective function**: Composed of multiple loss terms, including expression - similarity loss, lineage loss, clone loss, position - aware entropy loss, and variance loss. - **Optimization**: Find the optimal mapping matrix \( M \) by minimizing the total loss function. ### Main contributions - **Improved prediction accuracy**: On both simulated and real - world datasets, TemSOMap shows higher prediction accuracy than existing methods in various scenarios. - **Provision of spatio - temporal atlases**: Applying TemSOMap can generate spatio - temporal atlases of single cells, providing lineage, spatial, and gene - expression information for each single cell, which is helpful for analyzing the spatio - temporal dynamics of cells. ### Conclusion By combining time - dynamic information and lineage - tracing data, TemSOMap has achieved significant improvement in single - cell spatial - position prediction, providing a powerful tool for understanding the spatio - temporal distribution of cells in tissues.