Abstract:Abstract Background: Single-cell RNA sequencing (scRNA-seq) is a robust approach to facilitate cancer research, revealing insights into tumor heterogeneity, microenvironment, and treatment response. However, scRNA-seq results frequently encounter reproducibility challenges due to i) high data complexity, intensified by human intervention, and ii) insufficient methods standardization, leading to inconsistent findings. Methods: Here, we introduce SCRATCH (Single-Cell RnA-seq Toolkit and Pipeline for Cancer researcH), a Nextflow-based pipeline designed to improve reproducibility through a layered and modular architecture. SCRATCH follows FAIR principles and guidelines provided by the nf-core community. Result/Discussion: The pipeline provides three execution modes: end-to-end, iterative, and custom. In the end-to-end mode, the pipeline processes data from raw input to downstream analyses automatically. This mode employs ranking- and aggregation-based approaches. For instance, the ranking approach leverages benchmark metrics to select the most suitable method in distinct steps, e.g., batch correction. Therefore, ensuring a consistent and data-driven selection. On another hand, the aggregation approach uses multiple predictions to increase confidence levels, such as on CNV inference and malignant cell identification. These strategies minimize human intervention, ideal for beginner users, enabling rapid access to preliminary results and biological insights. Alternatively, the iterative mode allows intermediate users to define workflow breakpoints in a layered-based fashion. Users can pause, review results, and adjust decisions at stages (e.g., TME annotation), facilitating a "semi-supervised" approach for a more tailored analysis while retaining the SCRATCH framework. Thirdly, the custom mode enables precise executions based on modules for similar tasks (e.g., trajectory analysis and cell-cell communication), allowing experienced users to bypass the SCRATCH workflow and use it as an on-demand toolkit. This mode leverages pipeline parallelism for efficient processing, perfect for ongoing single-cell projects. SCRATCH produces HTML reports to ensure traceability and reproducibility. Conclusion: SCRATCH, an evolving project, comprises 05 subworkflows, 18 modules, and 25 tools. We envisage SCRATCH as an open-source tool and invite developers to leverage its modules for their pipelines. For more information, please visit https://break-through-cancer.github.io/btc-scrna-training. Citation Format: Andre F. Fonseca, Guangchun Han, Marcel Ribeiro-Dantas, Enyu Dai, Diljot Grewal, Matthew Zatzman, Eliyahu Havasov, Andrew McPherson, Break Through Cancer, Data Science TeamLab, Michael N. Noble, Rameen Beroukhim, Rachel Karchin, Sohrab P. Shah, Linghua Wang. Scratch: A highly modular pipeline for single-cell cancer research [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 863.

scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery

Abstract 3520: A scalable single cell RNA-seq pipeline leveraging machine learning and high-quality references for cell-type prediction

Identification of cell types, states and programs by learning gene set representations

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

scNovel: a scalable deep learning-based network for novel rare cell discovery in single-cell transcriptomics

SingleCAnalyzer: Interactive Analysis of Single Cell RNA-Seq Data on the Cloud

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

scAnalyzeR: A Comprehensive Software Package With Graphical User Interface for Single-Cell RNA Sequencing Analysis and its Application on Liver Cancer

Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data

Exploring Promising Biomarkers for Alzheimer’s Disease through the Computational Analysis of Peripheral Blood Single-Cell RNA Sequencing Data

Explainable deep neural networks for predicting sample phenotypes from single-cell transcriptomics

scRCA: a Siamese network-based pipeline for the annotation of cell types using imperfect single-cell RNA-seq reference data

Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review

Abstract 863: Scratch: A highly modular pipeline for single-cell cancer research

scDLC: a deep learning framework to classify large sample single-cell RNA-seq data

Identifying and training deep learning neural networks on biomedical-related datasets

CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq

Cell-type composition analysis of scRNA-seq data with deep convolution neural network

Patterns, Profiles, and Parsimony: Dissecting Transcriptional Signatures From Minimal Single-Cell RNA-Seq Output With SALSA

SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data

Massive single-cell RNA-seq analysis and imputation via deep learning