Abstract:The integration of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC-sequencing (scATAC-seq) data offers a unique opportunity to gain a comprehensive view of cellular identity with defining features and to infer gene-regulatory relationships. Despite the emergence of technologies that simultaneously capture both the gene expression and chromatin accessibility of individual cells (paired data), the practical challenges of these approaches (e.g., the unavailability in previous samples and prohibitive cost) have led researchers to turn to the existing trove of single-modality data generated from independent biological samples (unpaired data). Various computational tools have been developed to integrate these unpaired single modality datasets. However, the comparative performance of these tools has not been comprehensively evaluated, and a standard benchmark pipeline is still lacking. To address these challenges, we used pseudo-unpaired scRNA-seq and scATAC-seq data derived from publicly available paired single-cell multi-omics datasets to benchmark 14 publicly available integration methods. The primary goal of unpaired single-cell multi-omics integration is to narrow the omics gap while preserving cell type diversity. We therefore focused on pair-wise cell distance and cluster performance in the joint latent space constructed by various integration tools in our benchmarking pipeline. To ensure the robustness of these computational approaches, we examined their stability across a variety of scenarios, including variations in cell number, cell types, and biological and technical batch effects. A number of the integration methods tested produced promising results. While the widely used Seurat package was recently reported to have the best performance (Lee et al., Genome Biology 2023), other computational tools such as scVI, Cobolt, scJoint, scglue and scBridge performed equally well or better in reducing omics differences and facilitating the identification of cell clusters. Notably, scglue and Cobolt demonstrated strong performance in aligning the same cell from different modalities, and discrete clusters emerged in the joint latent space using scJoint and scBridge. These findings suggest that it may not be strictly necessary to use paired multi-omics data to guide integration to achieve favorable results. Our freely available benchmarking pipeline will empower researchers to identify the optimal data integration methods for their specific data, facilitate the benchmarking of new methods, and contribute to future method development in the field. Citation Format: Jiani Chen, Wanzi Xiao, Eric Zhang, Xiang Chen. Benchmarking unpaired single-cell RNA and single-cell ATAC integration [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4943.

Semi-supervised integration of single-cell transcriptomics data

Assessment of batch-correction methods for scRNA-seq data with a new test metric

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

Benchmarking atlas-level data integration in single-cell genomics

Integration for single-cell RNA sequencing data based on the shared cell type assignment

A Comprehensive Benchmarking Study on Computational Tools for Cross-omics Label Transfer from Single-cell RNA to ATAC Data

Comprehensive Integration of Single-Cell Data

Batch correction of single-cell sequencing data via an autoencoder architecture

Abstract 4943: Benchmarking unpaired single-cell RNA and single-cell ATAC integration

scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data

Beaconet: A Reference‐Free Method for Integrating Multiple Batches of Single‐Cell Transcriptomic Data in Original Molecular Space

Scican: Single-cell Chromatin Accessibility and Gene Expression Data Integration Via Cycle-consistent Adversarial Network

A Cell Cycle-aware Network for Data Integration and Label Transferring of Single-cell RNA-seq and ATAC-seq

Sctab: Scaling Cross-Tissue Single-Cell Annotation Models

Joint cell type identification in spatial transcriptomics and single-cell RNA sequencing data

Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features

A Cell Cycle‐Aware Network for Data Integration and Label Transferring of Single‐Cell RNA‐Seq and ATAC‐Seq

Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data

Integration and transfer learning of single-cell transcriptomes via cFIT

scATAcat: Cell-type annotation for scATAC-seq data

scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization