A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data

Alessandro Vinceti,Raffaele M Iannuzzi,Isabella Boyle,Lucia Trastulla,Catarina D Campbell,Francisca Vazquez,Joshua Dempster,Francesco Iorio
DOI: https://doi.org/10.1101/2024.01.30.577980
2024-06-13
Abstract:CRISPR-Cas9 screens stand as formidable tools for investigating biology with unprecedented precision and scale. One of their principal applications involves probing large panels of immortalised human cancer cell lines for viability reduction responses upon systematic genetic knock-out at a genome-wide level, to identify novel cancer dependencies and therapeutic targets. However, biases in CRISPR-Cas9 screens' data pose challenges, leading to potential confounding effects on their interpretation and compromising their overall quality. The mode of action of the Cas9 enzyme, exerted by the induction of DNA double-strand breaks at a locus targeted by a specifically designed single-guide RNA (sgRNA), is influenced by specific structural features of the target site, including copy number amplifications (CN bias). More worryingly, proximal targeted loci tend to generate similar gene-independent responses to CRISPR-Cas9 targeting (proximity bias), possibly due to Cas9-induced whole chromosome-arm truncations or other unknown genomic structural features and different chromatin accessibility levels. Different computational methods have been proposed to correct these biases in silico, each based on different modelling assumptions. We have benchmarked seven of the latest methods, rigorously evaluating their effectiveness for the first time in reducing both CN and proximity bias in the two largest publicly available cell-line-based CRISPR-Cas9 screens to date. We have also evaluated the ability of each method in preserving data quality and heterogeneity by assessing the extent to which the processed data allows accurate detection of true positive essential genes, established oncogenetic addictions, and known/novel biomarkers of cancer dependency. Our analysis sheds light on the ability of each method to correct biases arising from structural properties and other possible unknown factors associated with CRISPR-Cas9 screen data under different scenarios. In particular, it shows that among all tested methods CRISPRcleanR outperforms other methods in correcting both CN and proximity biases, while Chronos yields a final dataset better able to recapitulate known sets of essential and non-essential genes. Overall, our investigation provides guidance for the selection of the most appropriate bias-correction method, based on its strengths, weaknesses and experimental settings.
Bioinformatics
What problem does this paper attempt to address?