Optimal Gene Filtering for Single-Cell Data (Ogfsc)-a Gene Filtering Algorithm for Single-Cell RNA-seq Data

Jie Hao,Wei Cao,Jian Huang,Xin Zou,Ze-Guang Han
DOI: https://doi.org/10.1093/bioinformatics/bty1016
IF: 5.8
2019-01-01
Bioinformatics
Abstract:Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, e.g. by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as Optimal Gene Filtering for Single-Cell data, to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability and implementation https://github.com/XZouProjects/OGFSC.git Supplementary information Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?