Abstract:Background and Objective: differential expression analysis is one of the most popular activities in transcriptomic studies based on next-generation sequencing technologies. In fact, differentially expressed genes (DEGs) between two conditions represent ideal prognostic and diagnostic candidate biomarkers for many pathologies. As a result, several algorithms, such as DESeq2 and edgeR, have been developed to identify DEGs. Despite their widespread use, there is no consensus on which model performs best for different types of data, and many existing methods suffer from high False Discovery Rates (FDR). Methods: we present a new algorithm, DeClUt , based on the intuition that the expression profile of differentially expressed genes should form two reasonably compact and well-separated clusters. This, in turn, implies that the bipartition induced by the two conditions being compared should overlap with the clustering. The clustering algorithm underlying DeClUt was designed to be robust to outliers typical of RNA-seq data. In particular, we used the average silhouette function to enforce membership assignment of samples to the most appropriate condition. Results: DeClUt was tested on real RNA-seq datasets and benchmarked against four of the most widely used methods (edgeR, DESeq2, NOISeq, and SAMseq). Experiments showed a higher self-consistency of results than the competitors as well as a significantly lower False Positive Rate (FPR). Moreover, tested on a real prostate cancer RNA-seq dataset, DeClUt has highlighted 8 DE genes, linked to neoplastic process according to DisGeNET database, that none of the other methods had identified. Conclusions: our work presents a novel algorithm that builds upon basic concepts of data clustering and exhibits greater consistency and significantly lower False Positive Rate than state-of-the-art methods. Additionally, DeClUt is able to highlight relevant differentially expressed genes not otherwise identified by other tools contributing to improve efficacy of differential expression analyses in various biological applications.

[Mebendazole in the therapy of human hydatidosis. Evaluation of the results obtained in 9 patients with pulmonary localization].

Nonparametric clustering of RNA-sequencing data

moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms

Robust identification of temporal biomarkers in longitudinal omics studies

MONET: Multi-omic module discovery by omic selection

Overloading And unpacKing (OAK) - droplet-based combinatorial indexing for ultra-high throughput single-cell multiomic profiling

Microalgal species growing on piggery wastewater as a valuable candidate for nutrient removal and biodiesel production.

DeClUt : Decluttering differentially expressed genes through clustering of their expression profiles

De novo clustering of extensive long-read transcriptome datasets with isONclust3

Accurate, Fast and Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes

Agalma: an automated phylogenomics workflow

OmicsSuite: a customized and pipelined suite for analysis and visualization of multi-omics big data

OrthologAL: A Shiny application for quality-aware humanization of non-human pre-clinical high-dimensional gene expression data

An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection

Clustering of Transcriptomic Data for the Identification of Cancer Subtypes

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

Patterns, Profiles, and Parsimony: Dissecting Transcriptional Signatures From Minimal Single-Cell RNA-Seq Output With SALSA

ExpOmics: a comprehensive web platform empowering biologists with robust multi-omics data analysis capabilities