Cactus: a user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis

Jérôme Salignon,Lluís Millan-Ariño,Maxime Garcia,Christian G. Riedel
DOI: https://doi.org/10.1101/2023.05.11.540110
2024-05-06
Abstract:The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficiency or lack of downstream analysis in the current ATAC - Seq and mRNA - Seq data analysis processes. Although many existing analysis tools can handle basic data pre - processing and preliminary analysis, they are often lacking in in - depth biological insights, especially in combining the results of different methods and conducting enrichment analysis. These problems make it difficult for non - bioinformatics experts to obtain comprehensive data insights. To bridge this gap, the researchers developed Cactus, an end - to - end analysis pipeline for processing ATAC - Seq and mRNA - Seq data, supporting individual or combined analysis. Cactus solves the above problems in the following ways: 1. **Efficient and Reproducible Analysis**: Cactus is built on Nextflow, containers and virtual environments, ensuring the efficiency and reproducibility of the analysis. 2. **Comprehensive Data Pre - processing**: Cactus pre - processes the original reads, including steps such as quality control, trimming, compression, and alignment. 3. **Differential Analysis**: Cactus can perform differential analysis between conditions, identifying differentially accessible regions (DARs) and differentially expressed genes (DEGs). 4. **Enrichment Analysis**: Cactus conducts enrichment analysis in various databases, including DNA - binding motifs, ChIP - Seq binding sites, chromatin states and ontologies, providing comprehensive molecular insights. 5. **User - friendly**: Cactus is designed with user experience in mind, providing automated installation and test data sets, making it easy for non - bioinformatics experts to use. Through these functions, Cactus aims to help researchers quickly, user - friendly and reproducibly obtain comprehensive insights from chromatin accessibility and gene expression data.