Abstract:Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.

CRSCNV: A Cross-Model-Based Statistical Approach to Detect Copy Number Variations in Sequence Data

SM-RCNV: a Statistical Method to Detect Recurrent Copy Number Variations in Sequenced Samples

Accuracy Of Cnv Detection From Gwas Data

CNV-TV: A Robust Method to Discover Copy Number Variation from Short Sequencing Reads

CNV-P: a machine-learning framework for predicting high confident copy number variations

A Cluster-Based Approach for the Discovery of Copy Number Variations from Next-Generation Sequencing Data

PEcnv: accurate and efficient detection of copy number variations of various lengths

nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data

SeqCNV: a Novel Method for Identification of Copy Number Variations in Targeted Next-Generation Sequencing Data

Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies

CNVbd: A Method for Copy Number Variation Detection and Boundary Search

DL-CNV: A Deep Learning Method for Identifying Copy Number Variations Based on Next Generation Target Sequencing

A Sparse Model Based Detection of Copy Number Variations from Exome Sequencing Data

Combinatorial Detection Algorithm for Copy Number Variations Using High-throughput Sequencing Reads

Pscc: Sensitive And Reliable Population-Scale Copy Number Variation Detection Method Based On Low Coverage Sequencing

An Accurate and Powerful Method for Copy Number Variation Detection

Copy Number Variation Detection In Whole-Genome Sequencing Data Using The Bayesian Information Criterion

RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data

Robust Regression Analysis of Copy Number Variation Data based on a Univariate Score

SCCNV: A Software Tool for Identifying Copy Number Variation From Single-Cell Whole-Genome Sequencing

A Remark on Copy Number Variation Detection Methods