Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection
Saishi Cui,Sina Nassiri,Issa Zakeri
DOI: https://doi.org/10.1371/journal.pcbi.1012560
2024-10-29
PLoS Computational Biology
Abstract:Single-cell RNA sequencing (scRNA-seq) data analysis faces numerous challenges, including high sparsity, a high-dimensional feature space, and biological noise. These challenges hinder downstream analysis, necessitating the use of feature selection methods to identify informative genes, and reduce data dimensionality. However, existing methods for selecting highly variable genes (HVGs) exhibit limited overlap and inconsistent clustering performance across benchmark datasets. Moreover, these methods often struggle to accurately select HVGs from fine-resolution scRNA-seq datasets and minority cell types, which are more difficult to distinguish, raising concerns about the reliability of their results. To overcome these limitations, we propose a novel feature selection framework for scRNA-seq data called Mcadet. Mcadet integrates Multiple Correspondence Analysis (MCA), graph-based community detection, and a novel statistical testing approach. To assess the effectiveness of Mcadet, we conducted extensive evaluations using both simulated and real-world data, employing unbiased metrics for comparison. Our results demonstrate the superior performance of Mcadet in the selection of HVGs in scenarios involving fine-resolution scRNA-seq datasets and datasets containing minority cell populations. Overall, we demonstrate that Mcadet enhances the reliability of selected HVGs, although the impact of HVG selection on various downstream analyses varies and needs to be further investigated. scRNA-seq brings both great opportunities and challenges for transcriptomic analysis. While scRNA-seq enables the characterization of cell heterogeneity at an unprecedented resolution, analytical issues like sparsity, noise and bias can severely compromise interpretation if not addressed properly. To extract meaningful biological signals, effective feature selection is critical. We propose Mcadet, a novel framework for feature selection in scRNA-seq data. Mcadet aims to accurately identify informative genes from fine-resolution datasets and datasets with minority cell types where existing methods falter. Through comparative analysis on diverse simulated and real-world datasets, we demonstrate Mcadet's superior performance in selecting highly informative genes based on many evaluation metrics in different scenarios, by effectively isolating intrinsic variation and allowing multi-resolution community detection. By improving the process of feature selection of scRNA-seq data, Mcadet makes it easier for researchers from various backgrounds to work with scRNA-seq data. This means that people can benefit from more reliable and accessible insights into single-cell transcriptomics.
biochemical research methods,mathematical & computational biology