Abstract:Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to gain biological insights at the cellular level. However, due to technical limitations of the existing sequencing technologies, low gene expression values are often omitted, leading to inaccurate gene counts. Existing methods, including advanced deep learning techniques, struggle to reliably impute gene expressions due to a lack of mechanisms that explicitly consider the underlying biological knowledge of the system. In reality, it has long been recognized that gene–gene interactions may serve as reflective indicators of underlying biology processes, presenting discriminative signatures of the cells. A genomic data analysis framework that is capable of leveraging the underlying gene–gene interactions is thus highly desirable and could allow for more reliable identification of distinctive patterns of the genomic data through extraction and integration of intricate biological characteristics of the genomic data. Here we tackle the problem in two steps to exploit the gene–gene interactions of the system. We first reposition the genes into a 2D grid such that their spatial configuration reflects their interactive relationships. To alleviate the need for labeled ground truth gene expression datasets, a self-supervised 2D convolutional neural network is employed to extract the contextual features of the interactions from the spatially configured genes and impute the omitted values. Extensive experiments with both simulated and experimental scRNA-seq datasets are carried out to demonstrate the superior performance of the proposed strategy against the existing imputation methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inaccuracy of gene expression values in single - cell RNA sequencing (scRNA - seq) data. Due to the limitations of existing sequencing technologies, low gene expression values are often missed, resulting in inaccurate gene counts. Existing methods, including advanced deep - learning techniques, have difficulties in reliably inferring gene expression because these methods lack a mechanism that explicitly considers the underlying biological knowledge of the system. Specifically, the paper proposes a new self - supervised deep - learning framework to improve the accuracy of gene expression recovery by leveraging gene - gene interactions. ### Main problems 1. **Inaccuracy of gene expression data**: Due to technical limitations, low gene expression values are often missed in scRNA - seq data, resulting in inaccurate gene counts. 2. **Deficiencies of existing methods**: Existing gene expression recovery methods have limitations in accuracy and computational efficiency and are difficult to fully capture the long - range relationships between genes. ### Solutions The paper proposes a framework named TCER (Transform - and - Conquer Expression Recovery) to solve the above problems through the following steps: 1. **Mapping of gene - gene interactions**: - Rearrange genes into a 2D grid so that their spatial configuration reflects their interaction relationships. Specifically, genes with strong interactions are closer in GenoMap. - Use the Gromov - Wasserstein divergence minimization method to obtain the optimal projection matrix \(T\) to reconstruct the gene data into a 2D grid. 2. **Self - supervised deep - learning model**: - Design a deep neural network with an encoder - decoder structure named ER - Net for recovering gene expression values. - Introduce three cascaded Deformable Fusion Attention (DFA) modules in ER - Net to extract local and global gene - gene interaction features. - Use a dual - attention mechanism (channel attention and pixel attention) to adaptively allocate important feature information and improve the performance of the network. ### Experimental results The paper conducted extensive experiments on simulated and actual scRNA - seq datasets, demonstrating the superior performance of the TCER method in gene expression recovery, cell clustering, and trajectory analysis. Compared with existing methods, TCER shows significant advantages in multiple metrics, especially in Pearson correlation coefficient and UMAP visualization results. ### Formulas - **Gene - gene interaction intensity matrix \(C\)**: \[ C_{ij} = \begin{cases} - \frac{(\Omega^{-1})_{ij}}{\sqrt{(\Omega^{-1})_{ii} (\Omega^{-1})_{jj}}} & \text{if } i \neq j \\ 1 & \text{if } i = j \end{cases} \] where \(\Omega\) is the covariance matrix, and \(\Omega_{ij}\) represents the covariance of the expression values of the \(i\)-th gene and the \(j\)-th gene in all cells. - **Gromov - Wasserstein divergence**: \[ GW(C, \bar{C}, u, v) = \min_T E_{C,\bar{C}}(T) \] where \[ E_{C,\bar{C}}(T) = \sum_{i,j,k,l} L(C_{ik}, \bar{C}_{jl}) T_{ij} T_{kl} \] \[ L(a, b) = KL(a | b) = a \log \left( \frac{a}{b} \right) - a + b \] - **Standard convolution operation**: \[ F_{\text{std}}^{\text{out}}(p_x, p_y) =

Self-supervised deep learning of gene–gene interactions for improved gene expression recovery

Deep learning of gene relationships from single cell time-course expression data

Gene Regulatory Network Inference Using Convolutional Neural Networks from scRNA-seq Data

Biologically Informed Deep Learning to Infer Gene Program Activity in Single Cells

A new bioinformatics tool to recover missing gene expression in single-cell RNA sequencing data

DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

scTSSR: gene expression recovery for single-cell RNA sequencing using two-side sparse self-representation

scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network

A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

A deep auto-encoder model for gene expression prediction

dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using single cell time-course gene expression data

dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data

CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data

GE-Impute: graph embedding-based imputation for single-cell RNA-seq data

A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genes

Gene Expression Prediction based on Deep Learning

DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations

DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data

Effective gene expression prediction from sequence by integrating long-range interactions

A deep generative model for gene expression profiles from single-cell RNA sequencing