Abstract:Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.

Modeling Read Counts for Cnv Detection in Exome Sequencing Data

Exomehmm: A Hidden Markov Model for Detecting Copy Number Variation Using Whole-Exome Sequencing Data

Accuracy Of Cnv Detection From Gwas Data

Improving Detection Of Copy-Number Variation By Simultaneous Bias Correction And Read-Depth Segmentation

SeqCNV: a Novel Method for Identification of Copy Number Variations in Targeted Next-Generation Sequencing Data

nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data

Erds-Pe: A Paired Hidden Markov Model for Copy Number Variant Detection from Whole-Exome Sequencing Data

PEcnv: accurate and efficient detection of copy number variations of various lengths

Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation

A novel signal processing approach for the detection of copy number variations in the human genome

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

On the core segmentation algorithms of copy number variation detection tools

CNV-P: a machine-learning framework for predicting high confident copy number variations

Ximmer: A System for Improving Accuracy and Consistency of CNV Calling from Exome Data

DL-CNV: A Deep Learning Method for Identifying Copy Number Variations Based on Next Generation Target Sequencing

A Remark on Copy Number Variation Detection Methods

CNVbd: A Method for Copy Number Variation Detection and Boundary Search

Identification of copy number variants in whole-genome data using Reference Coverage Profiles

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

Copy Number Variation Detection In Whole-Genome Sequencing Data Using The Bayesian Information Criterion

Allele-specific Copy-Number Discovery from Whole-Genome and Whole-Exome Sequencing