Low Quality Cells Should Be Removed from Single-Cell RNA-Seq Data Analysis

Geng Chen,Meng Ren,Chengkai Lv,Tieliu Shi
DOI: https://doi.org/10.2139/ssrn.3307902
2018-01-01
Abstract:Single-cell RNA-seq (scRNA-seq) technologies are increasingly popular for transcriptomic profiling at single-cell resolution. However, scRNA-seq data are noisy, which needs appropriate quality control analysis to remove those low-quality cells. A recent study analyzed the transcriptome data of 3589 single cells of human glioblastoma sequenced by Smart-seq2 technology, but we found that their scRNA-seq data contained a significant portion of low quality samples. Those low quality cells were with unqualified profiles in terms of sequencing depth, mapping ratio or number of detectable genes. Surprisingly, we observed that 74 (2.06%) samples were with < 500,000 reads and 31 samples had even <10,000 reads. Moreover, 126 (3.51%) samples had the mapping ratio lower than 60%, indicating that those scRNA-seq samples contained a lot of unmapped reads resulted from low quality and/or RNA degradation. We also found that 507 (14.13%) and 834 (23.24%) samples had the number of detectable genes low than 2000 for 0.1 and 1 FPKM as threshold, respectively. Using the criteria of < 500,000 reads or < 60% mapping ratio or < 2000 detectable genes defined by 1 FPKM, a total of 932 (25.97%) samples were grouped into low quality cells. Our result indicates that those low quality samples could influence subpopulation identification of cells, which may result in inaccurate results and misinterpretation of the data. Collectively, our findings highlight that appropriate quality control analysis should be conducted to remove the low quality cells from scRNA-seq studies to get convincible result and reliable conclusion.
What problem does this paper attempt to address?