Abstract:Spatially resolved transcriptomics (SRT) data provide critical insights into gene expression patterns within tissue contexts, necessitating effective methods for identifying spatial domains. Traditional clustering techniques often overlook spatial information, leading to disjointed domains. Current computational approaches integrate spatial information but still face challenges in recognizing domain boundaries, scalability, and the need of independent clustering steps. We introduce stDyer, an end-to-end deep learning framework designed for spatial domain clustering in SRT data. stDyer combines a Gaussian Mixture Variational AutoEncoder (GMVAE) with graph attention networks (GATs) to simultaneously learn deep representations and perform clustering for units. A unique feature of stDyer is the dynamic graphs it adopts, which adaptively links units based on Gaussian Mixture assignments in the latent space, thereby improving spatial domain clustering and producing smoother domain boundaries. Additionally, stDyer's mini-batch neighbor sampling strategy facilitates scalability to large datasets and enables multi-GPU training. Benchmarking against state-of-the-art tools across various SRT technologies, stDyer demonstrates superior performance in spatial domain clustering, multi-slice analysis, and large-scale dataset handling.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies in spatial domain clustering methods in existing Spatially Resolved Transcriptomics (SRT) data. Specifically, traditional clustering techniques often overlook spatial information, resulting in the identified spatial domains lacking spatial continuity. Although existing computational methods incorporate spatial information, they still face challenges in identifying domain boundaries, scalability, and the need for independent clustering steps. ### Summary of Main Problems: 1. **Limitations of Traditional Clustering Methods**: - Traditional clustering methods (such as K - means and Leiden algorithms) rely solely on gene expression data and ignore spatial information, leading to the identified spatial domains lacking spatial continuity. 2. **Challenges of Existing Methods**: - **Identification of Domain Boundaries**: Existing methods have difficulties in identifying domain boundaries. - **Scalability**: Most methods struggle to handle large - scale datasets. - **Independent Clustering Steps**: Many existing models require independent clustering steps, which may lead to sub - optimal performance. 3. **Application in Multi - slice and Large - scale Datasets**: - Existing tools face scalability challenges when applied to multi - slice and large - scale datasets. ### stDyer's Solutions: To address the above problems, the author introduced stDyer, an end - to - end deep - learning framework for spatial domain clustering of SRT data. The main features of stDyer include: - **Gaussian Mixture Variational AutoEncoder (GMVAE)**: Combined with Graph Attention Networks (GATs), it simultaneously learns deep representations and performs clustering. - **Dynamic Graphs**: Adaptively links units according to Gaussian mixture assignments, thereby improving spatial domain clustering and generating smoother domain boundaries. - **Mini - batch Neighbor Sampling Strategy**: Enhances the ability to handle large - scale datasets and supports multi - GPU training. Through these improvements, stDyer has demonstrated excellent performance on multiple SRT technology platforms, especially in spatial domain clustering, multi - slice analysis, and large - scale dataset processing. ### Example of Formula: In stDyer, GMVAE assumes that units are embedded in the latent space following a Gaussian Mixture Model (GMM). Its probability distribution can be expressed as: \[ p(\mathbf{z}|\theta)=\sum_{k = 1}^{K}\pi_k\mathcal{N}(\mathbf{z}|\mu_k,\Sigma_k) \] where: - \(\mathbf{z}\) is the latent variable, - \(\pi_k\) is the weight of the \(k\)-th Gaussian component, - \(\mathcal{N}(\mathbf{z}|\mu_k,\Sigma_k)\) is a Gaussian distribution with mean \(\mu_k\) and covariance matrix \(\Sigma_k\). The parameters \(\mu\) and \(\sigma\) of the GMM are estimated by maximizing the log - likelihood of the marginal distribution of all units. In conclusion, stDyer significantly improves the performance of spatial domain clustering by combining deep learning and dynamic graph structures, solving the key problems in existing methods.

stDyer enables spatial domain clustering with dynamic graph embedding

Unraveling spatial domain characterization in spatially resolved transcriptomics with robust graph contrastive clustering

Graph deep learning enabled spatial domains identification for spatial transcriptomics

Accurately deciphering spatial domains for spatially resolved transcriptomics with stCluster

Structure-preserving visualization for single-cell RNA-Seq profiles using deep manifold transformation with batch-correction

DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax

stAA: adversarial graph autoencoder for spatial clustering task of spatially resolved transcriptomics

Accurate Spatial Heterogeneity Dissection and Gene Regulation Interpretation for Spatial Transcriptomics using Dual Graph Contrastive Learning

Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep

Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder

STGIC: A graph and image convolution-based method for spatial transcriptomic clustering

Generative Self-Supervised Graphs Enhance Integration, Imputation and Domains Identification of Spatial Transcriptomics

DeepST: identifying spatial domains in spatial transcriptomics by deep learning

Graph attention automatic encoder based on contrastive learning for domain recognition of spatial transcriptomics

Spatial Domain Identifying: Graph Attention Network with Two Different Decoders

Deciphering spatial domains from spatially resolved transcriptomics through spatially regularized deep graph networks

Graph domain adaptation–based framework for gene expression enhancement and cell type identification in large-scale spatially resolved transcriptomics

Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST

Spatial domains identification in spatial transcriptomics by domain knowledge-aware and subspace-enhanced graph contrastive learning

Statistical batch-aware embedded integration, dimension reduction and alignment for spatial transcriptomics

ST-SCSR: identifying spatial domains in spatial transcriptomics data via structure correlation and self-representation