Abstract:Abstract Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff’s enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff’s capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to enhance the gene expression information of spatial transcriptomics (ST) data through single - cell transcriptomics data (scRNA - seq), thereby compensating for the missing gene expression parts in ST data. Specifically, the paper proposes a new method named **stDiff**, which uses the diffusion model to learn the relationships of gene expression from scRNA - seq data and applies it to the completion of ST data. ### Core of the problem: 1. **Limitations of ST data**: Spatial transcriptomics (ST) techniques can retain the spatial position information in tissues, but there are limitations in the sensitivity of gene detection or the number of detectable genes. For example, imaging - based methods perform well at single - cell resolution, but usually can only detect hundreds of pre - selected genes; while sequencing - based methods can detect gene expressions across the whole - transcriptome range, but their spatial resolution is greater than that of a single cell and the capture rate is limited. 2. **Deficiencies of existing methods**: Current methods for enhancing ST data mainly rely on the similarity between scRNA - seq data and ST data, and complete the unmeasured parts by identifying the expression patterns of shared genes. However, these methods face the following challenges: - The sparsity of scRNA - seq and ST data makes accurate alignment difficult. - The batch effect further increases the difficulty of establishing accurate alignment through shared genes. - When using scRNA - seq as a reference for completion, it is easy to introduce batch bias, causing the predicted gene expression to be in a different batch space from the actual ST data, increasing the complexity of downstream analysis. 3. **Objective**: The objective of the paper is to develop a new method **stDiff**, which learns the gene expression relationships in scRNA - seq data through the diffusion model and uses these relationships to complete the missing gene expression parts in ST data, while avoiding introducing batch bias and ensuring that the prediction results are as consistent as possible with the real ST data. --- ### Key points of the solution: - **Application of the diffusion model**: stDiff adopts a conditional diffusion model, which captures the gene expression relationships in scRNA - seq data through two Markov processes (forward diffusion and reverse denoising). - Forward diffusion process: gradually introduce random noise into the initial RNA data. - Reverse denoising process: gradually restore the original data through the learned denoising conditional distribution. - **Avoiding the influence of batch effect**: stDiff enhances the robustness of the model by perturbing scRNA - seq data, paying less attention to the absolute gene expression values but emphasizing the inter - relationships among gene expressions. - **Completion strategy**: stDiff does not rely on the similarity between scRNA - seq and ST data, but completes the data by learning the regulatory rules in scRNA - seq data and combining the information of ST data itself. This strategy is similar to regarding each scRNA - seq cell as a complete image, and ST data as the masked version of this image, and the task is to complete the masked part. --- ### Summary: The paper attempts to solve the problem of how to learn gene expression relationships from scRNA - seq data through the diffusion model and apply them to the completion of ST data, in order to overcome the limitations of existing methods in similarity calculation, batch effect handling, and prediction accuracy. Through this method, stDiff can more accurately predict the missing gene expression information while retaining the spatial topological structure of ST data, providing high - quality data support for subsequent biological analysis.

stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics

DiffuST: a latent diffusion model for spatial transcriptomics denoising

SpatialDiffusion: Predicting Spatial Transcriptomics with Denoising Diffusion Probabilistic Models

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

SD2: Spatially resolved transcriptomics deconvolution through integration of dropout and spatial information

SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq

Observations on obesity: subjective impressions of 108 consecutive patients as to the causes of their obesity.

stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics

A comprehensive comparison on cell-type composition inference for spatial transcriptomics data

Effects of neuroleptics on hippocampal stimulation-induced ‘wet-dog shaking’ in rats

stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

Systematic comparison of sequencing-based spatial transcriptomic methods

Implementation of a Fault-Diagnosis Algorithm for Induction Machines Based on Advanced Digital-Signal-Processing Techniques

Computational solutions for spatial transcriptomics

A Unified Probabilistic Framework for Modeling and Inferring Spatial Transcriptomic Data

LETSmix: a spatially informed and learning-based domain adaptation method for cell-type deconvolution in spatial transcriptomics

Differential gene expression analysis of spatial transcriptomic experiments using spatial mixed models

stMCDI: Masked Conditional Diffusion Model with Graph Neural Network for Spatial Transcriptomics Data Imputation

Dissecting Spatiotemporal Structures in Spatial Transcriptomics via Diffusion-Based Adversarial Learning

Deciphering tissue structure and function using spatial transcriptomics

Benchmarking and integration of methods for deconvoluting spatial transcriptomic data