Abstract:Purpose: Single-cell RNA sequencing (scRNA-seq) is producing vast amounts of individual cell profiling data. Analysis of such datasets presents a significant challenge in accurately annotating cell types and their associated biomarkers. scRNA-seq datasets analysis will help us understand diseases such as Alzheimer's, Cancer, Diabetes, Coronavirus disease 2019 (COVID-19), Systemic Lupus Erythematosus (SLE), etc. Recently different pipelines based on machine learning (ML) and Deep Neural Network (DNN) methods have been employed to tackle these issues utilizing scRNA-seq datasets. These pipelines have arisen as a promising resource and are capable of extracting meaningful and concise features from noisy, diverse, and high-dimensional data to enhance annotations and subsequent analysis. Existing tools require high computational resources to execute large sample datasets. Methods: We have developed a cutting-edge platform known as scaLR (Single Cell Analysis using Low Resource) that efficiently processes data in batches, and reduces the required resources for processing large datasets and running NN models. scaLR is equipped with data processing, feature extraction, training, evaluation, and downstream analysis. The data processing module consists of sample-wise & standard scaler normalization and splitting of data. Its novel feature extraction algorithm, first trains the model on a feature subset and stores feature importance for all the features in that subset. At the end of this process, top K features are selected based on their importance. The model is trained on top K features, its performance evaluation and associated downstream analysis provide significant biomarkers for different cell types and diseases/traits. Results: To showcase the capabilities of scaLR, we utilized several scRNA-seq datasets of Peripheral Blood Mononuclear Cells (PBMCs), Alzheimer patients, and large datasets from human and mouse embryonic development. Our findings indicate that scaLR offers comparable prediction accuracy and requires less model training time and compute resources than existing Python-based pipelines and frameworks. Moreover, scaLR efficiently handles large sample datasets (>11.4 million cells) with minimal resource usage (29GB RAM, 12GB GPU, and 8 CPU) while maintaining high prediction accuracy and being capable of ranking the biomarker association with specific cell types and diseases. Conclusion We present scaLR a Python-based platform, engineered to utilize minimal computational resources while maintaining comparable execution times to existing frameworks. It is highly scalable and capable of efficiently handling datasets containing millions of cell samples and providing their classification and important biomarkers.

Sfaira Accelerates Data and Model Reuse in Single Cell Genomics

Sfaira Accelerates Data and Model Reuse in Single Cell Genomics.

Identification of cell types, states and programs by learning gene set representations

scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery

Sctab: Scaling Cross-Tissue Single-Cell Annotation Models

Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research

SingleCAnalyzer: Interactive Analysis of Single Cell RNA-Seq Data on the Cloud

CIARA: a Cluster-Independent Algorithm for Identifying Markers of Rare Cell Types from Single-Cell Sequencing Data

Single-Cell Data Integration and Cell Type Annotation through Contrastive Adversarial Open-set Domain Adaptation

FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data

scARE: Attribution Regularization for Single Cell Representation Learning

A sandbox for prediction and integration of DNA, RNA, and proteins in single cells

ASAP 2020 update: an open, scalable and interactive web-based portal for (single-cell) omics analyses

CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data

A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

Integrative, high-resolution analysis of single cells across experimental conditions with PARAFAC2

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data

scRNA-Explorer: An End-user Online Tool for Single Cell RNA-seq Data Analysis Featuring Gene Correlation and Data Filtering

CELLama: Foundation Model for Single Cell and Spatial Transcriptomics by Cell Embedding Leveraging Language Model Abilities

Fast and lightweight cell atlas approximations across organs and organisms