Abstract:The integration of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC-sequencing (scATAC-seq) data offers a unique opportunity to gain a comprehensive view of cellular identity with defining features and to infer gene-regulatory relationships. Despite the emergence of technologies that simultaneously capture both the gene expression and chromatin accessibility of individual cells (paired data), the practical challenges of these approaches (e.g., the unavailability in previous samples and prohibitive cost) have led researchers to turn to the existing trove of single-modality data generated from independent biological samples (unpaired data). Various computational tools have been developed to integrate these unpaired single modality datasets. However, the comparative performance of these tools has not been comprehensively evaluated, and a standard benchmark pipeline is still lacking. To address these challenges, we used pseudo-unpaired scRNA-seq and scATAC-seq data derived from publicly available paired single-cell multi-omics datasets to benchmark 14 publicly available integration methods. The primary goal of unpaired single-cell multi-omics integration is to narrow the omics gap while preserving cell type diversity. We therefore focused on pair-wise cell distance and cluster performance in the joint latent space constructed by various integration tools in our benchmarking pipeline. To ensure the robustness of these computational approaches, we examined their stability across a variety of scenarios, including variations in cell number, cell types, and biological and technical batch effects. A number of the integration methods tested produced promising results. While the widely used Seurat package was recently reported to have the best performance (Lee et al., Genome Biology 2023), other computational tools such as scVI, Cobolt, scJoint, scglue and scBridge performed equally well or better in reducing omics differences and facilitating the identification of cell clusters. Notably, scglue and Cobolt demonstrated strong performance in aligning the same cell from different modalities, and discrete clusters emerged in the joint latent space using scJoint and scBridge. These findings suggest that it may not be strictly necessary to use paired multi-omics data to guide integration to achieve favorable results. Our freely available benchmarking pipeline will empower researchers to identify the optimal data integration methods for their specific data, facilitate the benchmarking of new methods, and contribute to future method development in the field. Citation Format: Jiani Chen, Wanzi Xiao, Eric Zhang, Xiang Chen. Benchmarking unpaired single-cell RNA and single-cell ATAC integration [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4943.

Comprehensive Integration of Single-Cell Data

Scart: Recognizing Cell Clusters and Constructing Trajectory from Single-Cell Epigenomic Data

Abstract 4943: Benchmarking unpaired single-cell RNA and single-cell ATAC integration

Population-level Integration of Single-Cell Datasets Enables Multi-Scale Analysis Across Samples

Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics

Unbiased integration of single cell multi-omics data

Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data

A sandbox for prediction and integration of DNA, RNA, and proteins in single cells

Query to reference single cell integration with transfer learning

Integrating single-cell transcriptomic data across different conditions, technologies, and species

SCInter: a comprehensive single-cell transcriptome integration database for human and mouse

Subjective complaints during desipramine treatment. Relative importance of plasma drug concentrations and the severity of depression.

Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace

scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

TEA-seq: a trimodal assay for integrated single cell measurement of transcription, epitopes, and chromatin accessibility

InTraSeq: A Multimodal Assay that Uncovers New Single-Cell Biology and Regulatory Mechanisms

Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps

Benchmarking atlas-level data integration in single-cell genomics

FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets

scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data