CRSCNV: A Cross-Model-Based Statistical Approach to Detect Copy Number Variations in Sequence Data

Guojun Liu,Xiguo Yuan,Junying Zhang,Haiyong Zhao,Junping Li,Junbo Duan
DOI: https://doi.org/10.1109/access.2019.2962156
IF: 3.9
2020-01-01
IEEE Access
Abstract:Copy number variation (CNV) is an important type of mutation in the human genome, and is significantly associated with cancer and other diseases. Accurate detection of CNVs in tumor genomes is crucial for biologists aiming to understand tumorigenesis. One of the key steps in this task is to establish a reasonable model to conduct meaningful assessment of each genome region. Although a great number of computational approaches have been developed in the past few years, none of them is versatile enough to detect CNVs in all scenarios associated with complex genomes. In this paper, we propose a new statistical approach, called CRSCNV, to detect CNVs in individual samples, based on next-generation sequencing data. The approach adopts a cross-model-based statistical strategy to test the significance of genome bins, i.e., the genome to be analyzed is divided into N parts. The bins in each part are tested by establishing a statistical model based on the remaining (N - 1) parts. The advantage of such a cross model is that it can improve the meaning of P-value assessment. We tested the performance of CRSCNV on a large number of simulation datasets and compared it to the state-of-the-art methods. The results demonstrated that CRSCNV achieved the best trade-off between recall and precision. We further validated CRSCNV, using several real sequencing samples, where it produced a number of previously reported CNVs and some additional CNVs with potential biological importance. Thus, CRSCNV is a reliable approach for CNV detection, even in scenarios of extremely low purity.
What problem does this paper attempt to address?