Abstract:The integration of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC-sequencing (scATAC-seq) data offers a unique opportunity to gain a comprehensive view of cellular identity with defining features and to infer gene-regulatory relationships. Despite the emergence of technologies that simultaneously capture both the gene expression and chromatin accessibility of individual cells (paired data), the practical challenges of these approaches (e.g., the unavailability in previous samples and prohibitive cost) have led researchers to turn to the existing trove of single-modality data generated from independent biological samples (unpaired data). Various computational tools have been developed to integrate these unpaired single modality datasets. However, the comparative performance of these tools has not been comprehensively evaluated, and a standard benchmark pipeline is still lacking. To address these challenges, we used pseudo-unpaired scRNA-seq and scATAC-seq data derived from publicly available paired single-cell multi-omics datasets to benchmark 14 publicly available integration methods. The primary goal of unpaired single-cell multi-omics integration is to narrow the omics gap while preserving cell type diversity. We therefore focused on pair-wise cell distance and cluster performance in the joint latent space constructed by various integration tools in our benchmarking pipeline. To ensure the robustness of these computational approaches, we examined their stability across a variety of scenarios, including variations in cell number, cell types, and biological and technical batch effects. A number of the integration methods tested produced promising results. While the widely used Seurat package was recently reported to have the best performance (Lee et al., Genome Biology 2023), other computational tools such as scVI, Cobolt, scJoint, scglue and scBridge performed equally well or better in reducing omics differences and facilitating the identification of cell clusters. Notably, scglue and Cobolt demonstrated strong performance in aligning the same cell from different modalities, and discrete clusters emerged in the joint latent space using scJoint and scBridge. These findings suggest that it may not be strictly necessary to use paired multi-omics data to guide integration to achieve favorable results. Our freely available benchmarking pipeline will empower researchers to identify the optimal data integration methods for their specific data, facilitate the benchmarking of new methods, and contribute to future method development in the field. Citation Format: Jiani Chen, Wanzi Xiao, Eric Zhang, Xiang Chen. Benchmarking unpaired single-cell RNA and single-cell ATAC integration [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4943.

Benchmarking atlas-level data integration in single-cell genomics

Abstract 4943: Benchmarking unpaired single-cell RNA and single-cell ATAC integration

Benchmarking strategies for cross-species integration of single-cell RNA sequencing data

Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data

Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data

Benchmarking algorithms for single-cell multi-omics prediction and integration

Benchmarking computational methods for single-cell chromatin data analysis

Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects

Systematic benchmarking of single-cell ATAC-sequencing protocols

Benchmarking single-cell RNA-sequencing protocols for cell atlas projects

Multi-task benchmarking of single-cell multimodal omics integration methods

Comprehensive Integration of Single-Cell Data

Benchmarking integration of single-cell differential expression

Population-level Integration of Single-Cell Datasets Enables Multi-Scale Analysis Across Samples

A sandbox for prediction and integration of DNA, RNA, and proteins in single cells

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments

A Comprehensive Benchmarking Study on Computational Tools for Cross-omics Label Transfer from Single-cell RNA to ATAC Data

Mapping cells to gene programs

Beaconet: A Reference‐Free Method for Integrating Multiple Batches of Single‐Cell Transcriptomic Data in Original Molecular Space

Statistical Single Cell Multi-Omics Integration

Query to reference single cell integration with transfer learning