Exploring High-throughput Biomolecular Data with Multiobjective Robust Continuous Clustering

Yunhe Wang,Ka-Chun Wong,Xiangtao Li
DOI: https://doi.org/10.1016/j.ins.2021.11.030
IF: 8.1
2022-01-01
Information Sciences
Abstract:Clustering of cell types from a large number of high-dimensional heterogeneous cells is a vital step in analyzing single-cell RNA-seq data. Although several computational methods have been proposed to evolve such data, most of them suffer from some limitations such as high-level noise, high dimensionality, and low generalization. To address these challenges, a multiobjective robust continuous clustering algorithm (MORCC) is presented to discriminate the different cell types in a single-cell RNA-seq dataset. Stepwise, first a dimensionality reduction method is applied to map the high-dimensional heterogeneous cells into a desired low-dimensional space while preserving the features of the original space. Then, to overcome the instability of trial-and-error connectivity weights in the robust continuous clustering, MORCC proposes applying evolutionary operators to optimize the connectivity weights dynamically, and to select the suitable parameters with two cluster validity indices. To demonstrate the effectiveness of MORCC, we compare it to several state-of-the-art methods on six single-cell RNA-seq datasets, revealing its superior clustering ability from several perspectives. In addition, we carry out a parameter analysis, a case study, and visualization and biological interpretability analyses to validate MORCC’s cell identification capability on single-cell RNA-seq data.
What problem does this paper attempt to address?