Abstract:The integration of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC-sequencing (scATAC-seq) data offers a unique opportunity to gain a comprehensive view of cellular identity with defining features and to infer gene-regulatory relationships. Despite the emergence of technologies that simultaneously capture both the gene expression and chromatin accessibility of individual cells (paired data), the practical challenges of these approaches (e.g., the unavailability in previous samples and prohibitive cost) have led researchers to turn to the existing trove of single-modality data generated from independent biological samples (unpaired data). Various computational tools have been developed to integrate these unpaired single modality datasets. However, the comparative performance of these tools has not been comprehensively evaluated, and a standard benchmark pipeline is still lacking. To address these challenges, we used pseudo-unpaired scRNA-seq and scATAC-seq data derived from publicly available paired single-cell multi-omics datasets to benchmark 14 publicly available integration methods. The primary goal of unpaired single-cell multi-omics integration is to narrow the omics gap while preserving cell type diversity. We therefore focused on pair-wise cell distance and cluster performance in the joint latent space constructed by various integration tools in our benchmarking pipeline. To ensure the robustness of these computational approaches, we examined their stability across a variety of scenarios, including variations in cell number, cell types, and biological and technical batch effects. A number of the integration methods tested produced promising results. While the widely used Seurat package was recently reported to have the best performance (Lee et al., Genome Biology 2023), other computational tools such as scVI, Cobolt, scJoint, scglue and scBridge performed equally well or better in reducing omics differences and facilitating the identification of cell clusters. Notably, scglue and Cobolt demonstrated strong performance in aligning the same cell from different modalities, and discrete clusters emerged in the joint latent space using scJoint and scBridge. These findings suggest that it may not be strictly necessary to use paired multi-omics data to guide integration to achieve favorable results. Our freely available benchmarking pipeline will empower researchers to identify the optimal data integration methods for their specific data, facilitate the benchmarking of new methods, and contribute to future method development in the field. Citation Format: Jiani Chen, Wanzi Xiao, Eric Zhang, Xiang Chen. Benchmarking unpaired single-cell RNA and single-cell ATAC integration [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4943.

A Comprehensive Benchmarking Study on Computational Tools for Cross-omics Label Transfer from Single-cell RNA to ATAC Data

Scart: Recognizing Cell Clusters and Constructing Trajectory from Single-Cell Epigenomic Data

Abstract 4943: Benchmarking unpaired single-cell RNA and single-cell ATAC integration

Systematic benchmarking of single-cell ATAC-sequencing protocols

Benchmarking algorithms for gene set scoring of single-cell ATAC-seq data

Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data

scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization

Benchmarking computational methods for single-cell chromatin data analysis

Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data

Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data

Translator: A Transfer Learning Approach to Facilitate Single-Cell ATAC-Seq Data Analysis from Reference Dataset

Fundamental and Practical Approaches for Single-Cell ATAC-seq Analysis

Enhancement and Imputation of Peak Signal Enables Accurate Cell-Type Classification in scATAC-seq

Evaluation of Classification in Single Cell Atac-Seq Data with Machine Learning Methods

SnapATAC: A Comprehensive Analysis Package for Single Cell ATAC-seq

Assessment of Machine Learning Methods for Classification in Single Cell ATAC-seq

Comprehensive Analysis of Single Cell ATAC-seq Data with SnapATAC.

Benchmarking algorithms for single-cell multi-omics prediction and integration

Going beyond cell clustering and feature aggregation: Is there single cell level information in single-cell ATAC-seq data?

SCAN-ATAC-Sim: a scalable and efficient method for simulating single-cell ATAC-seq data from bulk-tissue experiments

Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis