Abstract:The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2 FC threshold of 0.25 and a minimum feature detection of 25 % . From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes ( Rps4x , Rps18 , Rpl13a , Rps12 and Rpl18a ). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data

Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering

Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles

A Cell Marker-Based Clustering Strategy (cmcluster) for Precise Cell Type Identification of Scrna-Seq Data

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Joint learning dimension reduction and clustering of single-cell RNA-sequencing data

Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data

Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data

Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations

A Hybrid Deep Clustering Approach for Robust Cell Type Profiling Using Single-cell RNA-seq Data

Clustering single cell CITE-seq data with a canonical correlation based deep learning method

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data

DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data

LMD: Cluster-Independent Multiscale Marker Identification in Single-cell RNA-seq Data

Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data

Fast Dimension Reduction and Integrative Clustering of Multi-Omics Data Using Low-Rank Approximation: Application to Cancer Molecular Classification

Cluster Analysis of High-Dimensional scRNA Sequencing Data