Abstract:Purpose: Single-cell RNA sequencing (scRNA-seq) is producing vast amounts of individual cell profiling data. Analysis of such datasets presents a significant challenge in accurately annotating cell types and their associated biomarkers. scRNA-seq datasets analysis will help us understand diseases such as Alzheimer's, Cancer, Diabetes, Coronavirus disease 2019 (COVID-19), Systemic Lupus Erythematosus (SLE), etc. Recently different pipelines based on machine learning (ML) and Deep Neural Network (DNN) methods have been employed to tackle these issues utilizing scRNA-seq datasets. These pipelines have arisen as a promising resource and are capable of extracting meaningful and concise features from noisy, diverse, and high-dimensional data to enhance annotations and subsequent analysis. Existing tools require high computational resources to execute large sample datasets. Methods: We have developed a cutting-edge platform known as scaLR (Single Cell Analysis using Low Resource) that efficiently processes data in batches, and reduces the required resources for processing large datasets and running NN models. scaLR is equipped with data processing, feature extraction, training, evaluation, and downstream analysis. The data processing module consists of sample-wise & standard scaler normalization and splitting of data. Its novel feature extraction algorithm, first trains the model on a feature subset and stores feature importance for all the features in that subset. At the end of this process, top K features are selected based on their importance. The model is trained on top K features, its performance evaluation and associated downstream analysis provide significant biomarkers for different cell types and diseases/traits. Results: To showcase the capabilities of scaLR, we utilized several scRNA-seq datasets of Peripheral Blood Mononuclear Cells (PBMCs), Alzheimer patients, and large datasets from human and mouse embryonic development. Our findings indicate that scaLR offers comparable prediction accuracy and requires less model training time and compute resources than existing Python-based pipelines and frameworks. Moreover, scaLR efficiently handles large sample datasets (>11.4 million cells) with minimal resource usage (29GB RAM, 12GB GPU, and 8 CPU) while maintaining high prediction accuracy and being capable of ranking the biomarker association with specific cell types and diseases. Conclusion We present scaLR a Python-based platform, engineered to utilize minimal computational resources while maintaining comparable execution times to existing frameworks. It is highly scalable and capable of efficiently handling datasets containing millions of cell samples and providing their classification and important biomarkers.

scDLC: a deep learning framework to classify large sample single-cell RNA-seq data

LncLSTA: A Versatile Predictor Unveiling Subcellular Localization of Lncrnas Through Long-Short Term Attention

scDA: Single cell discriminant analysis for single-cell RNA sequencing data

Deep Learning for clustering single-cell RNA-seq Data

Clustering single-cell RNA-seq data with a model-based deep learning approach

HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data

DCA-CLA: A Scrna-Seq Classification Framework Based on Deep Count Autoencoder

Batch alignment of single-cell transcriptomics data using deep metric learning

Massive single-cell RNA-seq analysis and imputation via deep learning

Deep learning-based advances and applications for single-cell RNA-sequencing data analysis

scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model

Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data

Scsdsc: Self-supervised Deep Subspace Clustering for Scrna-Seq Data

Non-negative low-rank representation based on dictionary learning for single-cell RNA-sequencing data analysis

SDLDA: lncRNA-disease association prediction based on singular value decomposition and deep learning

scDCCA: deep contrastive clustering for single-cell RNA-seq data based on auto-encoder network

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding

scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations

scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery