scATAcat: Cell-type annotation for scATAC-seq data

Aybuge Altay,Martin Vingron
DOI: https://doi.org/10.1101/2024.01.24.577073
2024-01-29
Abstract:Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of “marker regions” which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at .
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the issue of cell type annotation in single-cell ATAC sequencing (scATAC-seq) data. Specifically, scATAC-seq technology can measure chromatin accessibility in cells, but this data is not easily used directly for cell type annotation. Compared to single-cell RNA sequencing (scRNA-seq), scATAC-seq lacks "marker regions" that can be used for annotation. Current methods typically convert accessibility into expression space and rely on gene expression patterns for annotation. However, this approach may not be accurate enough in some cases. The paper proposes a new method—scATAcat, which uses known bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To alleviate the inherent sparsity problem of single-cell data, this method aggregates cells within the same cluster to create pseudo-bulk data. The performance of scATAcat is evaluated by collecting multiple annotated datasets. Additionally, the paper discusses the challenges and biases faced in cell type annotation in scATAC-seq data. In summary, scATAcat aims to improve the accuracy of cell type annotation in single-cell ATAC sequencing data by leveraging bulk ATAC-seq data.