Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives

Boris P. Hejblum,Kalidou Ba,Rodolphe Thiébaut,Denis Agniel
DOI: https://doi.org/10.1186/s13059-024-03231-9
IF: 17.906
2024-11-02
Genome Biology
Abstract:A recent study reported exaggerated false positives by popular differential expression methods when analyzing large population samples. We reproduce the differential expression analysis simulation results and identify a caveat in the data generation process. Data not truly generated under the null hypothesis led to incorrect comparisons of benchmark methods. We provide corrected simulation results that demonstrate the good performance of dearseq and argue against the superiority of the Wilcoxon rank-sum test as suggested in the previous study.
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?