Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis

Jiating Yu,Jiacheng Leng,Zhichao Hou,Duanchen Sun,Ling-Yun Wu
DOI: https://doi.org/10.1093/bib/bbae093
IF: 9.5
2024-03-19
Briefings in Bioinformatics
Abstract:Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data provided new insights into the understanding of epigenetic heterogeneity and transcriptional regulation. With the increasing abundance of dataset resources, there is an urgent need to extract more useful information through high-quality data analysis methods specifically designed for scATAC-seq. However, analyzing scATAC-seq data poses challenges due to its near binarization, high sparsity and ultra-high dimensionality properties. Here, we proposed a novel network diffusion–based computational method to comprehensively analyze scATAC-seq data, named Single-Cell ATAC-seq Analysis via Network Refinement with Peaks Location Information (SCARP). SCARP formulates the Network Refinement diffusion method under the graph theory framework to aggregate information from different network orders, effectively compensating for missing signals in the scATAC-seq data. By incorporating distance information between adjacent peaks on the genome, SCARP also contributes to depicting the co-accessibility of peaks. These two innovations empower SCARP to obtain lower-dimensional representations for both cells and peaks more effectively. We have demonstrated through sufficient experiments that SCARP facilitated superior analyses of scATAC-seq data. Specifically, SCARP exhibited outstanding cell clustering performance, enabling better elucidation of cell heterogeneity and the discovery of new biologically significant cell subpopulations. Additionally, SCARP was also instrumental in portraying co-accessibility relationships of accessible regions and providing new insight into transcriptional regulation. Consequently, SCARP identified genes that were involved in key Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to diseases and predicted reliable cis -regulatory interactions. To sum up, our studies suggested that SCARP is a promising tool to comprehensively analyze the scATAC-seq data.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
This paper attempts to address the challenges in single - cell ATAC - seq data analysis, especially the analytical difficulties brought about by its near - binary, high - sparsity and ultra - high - dimensional characteristics. Specifically, the paper proposes a new network - diffusion - based method - SCARP (Single - Cell ATAC - seq Analysis via Network Refinement with Peaks Location Information) to analyze single - cell ATAC - seq data more effectively. By integrating genomic distance information and using the Network Refinement (NR) diffusion method, SCARP can compensate for the missing signals in the data and depict the co - accessibility of peaks, thereby obtaining low - dimensional representations of cells and peaks, improving cell - clustering performance, and better revealing cell heterogeneity and discovering new biologically significant cell subpopulations. In addition, SCARP can also characterize the co - accessibility relationships of accessible regions, providing new insights into transcriptional regulation. In summary, SCARP has demonstrated superior cell - clustering performance and robustness on multiple benchmark scATAC - seq datasets, proving its potential in comprehensively analyzing scATAC - seq data.