CEDA: integrating gene expression data with CRISPR-pooled screen data identifies essential genes with higher expression

Yue Zhao,Lianbo Yu,Xue Wu,Haoran Li,Kevin R Coombes,Kin Fai Au,Lijun Cheng,Lang Li
DOI: https://doi.org/10.1093/bioinformatics/btac668
IF: 5.8
2022-10-17
Bioinformatics
Abstract:Abstract Motivation Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. Methods We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation–maximization algorithm are used for parameter estimation and false discovery rate inference. Results Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?