Abstract:The integration of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC-sequencing (scATAC-seq) data offers a unique opportunity to gain a comprehensive view of cellular identity with defining features and to infer gene-regulatory relationships. Despite the emergence of technologies that simultaneously capture both the gene expression and chromatin accessibility of individual cells (paired data), the practical challenges of these approaches (e.g., the unavailability in previous samples and prohibitive cost) have led researchers to turn to the existing trove of single-modality data generated from independent biological samples (unpaired data). Various computational tools have been developed to integrate these unpaired single modality datasets. However, the comparative performance of these tools has not been comprehensively evaluated, and a standard benchmark pipeline is still lacking. To address these challenges, we used pseudo-unpaired scRNA-seq and scATAC-seq data derived from publicly available paired single-cell multi-omics datasets to benchmark 14 publicly available integration methods. The primary goal of unpaired single-cell multi-omics integration is to narrow the omics gap while preserving cell type diversity. We therefore focused on pair-wise cell distance and cluster performance in the joint latent space constructed by various integration tools in our benchmarking pipeline. To ensure the robustness of these computational approaches, we examined their stability across a variety of scenarios, including variations in cell number, cell types, and biological and technical batch effects. A number of the integration methods tested produced promising results. While the widely used Seurat package was recently reported to have the best performance (Lee et al., Genome Biology 2023), other computational tools such as scVI, Cobolt, scJoint, scglue and scBridge performed equally well or better in reducing omics differences and facilitating the identification of cell clusters. Notably, scglue and Cobolt demonstrated strong performance in aligning the same cell from different modalities, and discrete clusters emerged in the joint latent space using scJoint and scBridge. These findings suggest that it may not be strictly necessary to use paired multi-omics data to guide integration to achieve favorable results. Our freely available benchmarking pipeline will empower researchers to identify the optimal data integration methods for their specific data, facilitate the benchmarking of new methods, and contribute to future method development in the field. Citation Format: Jiani Chen, Wanzi Xiao, Eric Zhang, Xiang Chen. Benchmarking unpaired single-cell RNA and single-cell ATAC integration [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4943.

Benchmarking deep learning methods for biologically conserved single-cell integration

Abstract 4943: Benchmarking unpaired single-cell RNA and single-cell ATAC integration

Benchmarking algorithms for single-cell multi-omics prediction and integration

Benchmarking atlas-level data integration in single-cell genomics

Benchmarking strategies for cross-species integration of single-cell RNA sequencing data

Deep Batch Integration and Denoise of Single‐Cell RNA‐Seq Data

Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data

Benchmarking integration of single-cell differential expression

Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data

Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution

Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics

Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features

Deep Learning in Single-Cell and Spatial Transcriptomics Data Analysis: Advances and Challenges from a Data Science Perspective

scRNA-seq mixology: towards better benchmarking of single cell RNA-seq analysis methods

A Comprehensive Benchmarking Study on Computational Tools for Cross-omics Label Transfer from Single-cell RNA to ATAC Data

Integration of single cell data by disentangled representation learning

Assessment of batch-correction methods for scRNA-seq data with a new test metric

A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples

Benchmarking computational methods for single-cell chromatin data analysis

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

Identifying strengths and weaknesses of methods for computational network inference from single-cell RNA-seq data