DeepCNV: a deep learning approach for authenticating copy number variations

Joseph T Glessner,Xiurui Hou,Cheng Zhong,Jie Zhang,Munir Khan,Fabian Brand,Peter Krawitz,Patrick M A Sleiman,Hakon Hakonarson,Zhi Wei

DOI: https://doi.org/10.1093/bib/bbaa381

IF: 9.5

2021-01-12

Briefings in Bioinformatics

Abstract:Abstract Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.

biochemical research methods,mathematical & computational biology

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to reduce the high false - positive rate in Copy Number Variations (CNVs) detection**. ### Background and Problem Description: 1. **Importance of CNV**: - CNV is an important genomic structural variation and is closely related to the pathogenesis of multiple complex diseases (such as schizophrenia and osteoporosis). - Detecting CNV has become a routine operation in genetic research and cancer research. 2. **Limitations of Current Methods**: - The currently most commonly used CNV detection methods (such as PennCNV and QuantiSNP) have a relatively high false - positive rate. - In order to reduce false - positive results, it usually requires human experts to manually review the original CNV detection results, but this is very time - consuming and highly subjective. 3. **Specific Problems**: - How to replace manual review with an automated method, thereby effectively reducing the false - positive rate in CNV detection? - How to improve the accuracy and confidence of CNV detection to better support disease - related research? ### Proposed Solution: The paper proposes a deep - learning - based tool **DeepCNV** aiming to solve the above problems. The main goals of DeepCNV are: - Automatically verify the CNV detection results generated by tools such as PennCNV. - Reduce the false - positive rate while maintaining high sensitivity. - Avoid the cumbersome process of manual review and improve efficiency. ### Core Innovation Points of DeepCNV: 1. **Combination of Image Data and Metadata**: - Use the LRR (Log R Ratio) and BAF (B Allele Frequency) scatter plots output by PennCNV as image data input. - Utilize the quality check statistical information (such as CNV length, SNP number, etc.) generated by PennCNV as metadata input. 2. **Deep Neural Network Architecture**: - DeepCNV adopts a hybrid deep neural network structure, consisting of two branches: - **CNN Branch**: Used to process image data and extract features from CNV scatter plots. - **DNN Branch**: Used to process metadata and analyze the influence of statistical information on the final decision. - The outputs of the two branches are concatenated and sent to a fully - connected layer, and finally the classification probability is generated through the sigmoid activation function. 3. **Large - Scale Training Data**: - Use more than 10,000 expert - annotated samples for training and testing to ensure the generalization ability of the model. ### Experimental Verification: 1. **Human - Annotated Dataset**: - On an independent human - annotated dataset, the AUC of DeepCNV reaches 0.909, which is significantly better than other machine - learning methods. - Especially in the detection of small - scale CNV (<5 kb), DeepCNV shows the greatest improvement. 2. **WGS Dataset**: - On the whole - genome sequencing (WGS) dataset, DeepCNV also shows superior performance, further verifying its applicability. 3. **Grad - CAM Visualization**: - Through Grad - CAM technology, visualize the important areas of attention of the CNN model and explain the prediction basis of the model. ### Summary: DeepCNV has successfully solved the problem of high false - positive rate in CNV detection, significantly improved the accuracy and confidence of CNV detection, and provided a more reliable tool for disease - related genomic research.

DeepCNV: a deep learning approach for authenticating copy number variations

Accuracy Of Cnv Detection From Gwas Data

CNV-P: a machine-learning framework for predicting high confident copy number variations

DL-CNV: A Deep Learning Method for Identifying Copy Number Variations Based on Next Generation Target Sequencing

CNVoyant a machine learning framework for accurate and explainable copy number variant classification

SeqCNV: a Novel Method for Identification of Copy Number Variations in Targeted Next-Generation Sequencing Data

CNVbd: A Method for Copy Number Variation Detection and Boundary Search

Noninvasive detection of chromosomal CNV and single gene disease by massively parallel sequencing of cfDNA

CNVABNN: An AdaBoost algorithm and neural networks-based detection of copy number variations from NGS data

CNValidatron, automated validation of CNV calls using computer vision

PEcnv: accurate and efficient detection of copy number variations of various lengths

CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics

Abstract 5102: Deep Learning Method for the Classification of CNV Based on the Next Generation Target Sequencing

A Remark on Copy Number Variation Detection Methods

CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data

CNV-Profile Regression: A New Approach for Copy Number Variant Association Analysis in Whole Genome Sequencing Data

Robust Regression Analysis of Copy Number Variation Data based on a Univariate Score

Copy Number Aberrations from Affymetrix SNP 6.0 Genotyping Data-How Accurate Are Commonly Used Prediction Approaches?

Accurate detection of CNV based on single-nucleotide variants recalibration and image classification from whole genome sequencing

CNVDeep: deep association of copy number variants with neurocognitive disorders

CNV-Finder: Streamlining Copy Number Variation Discovery