Abstract:Background: Copy number variations (CNVs) are of great importance to many cancers. Recently, the next-generation sequencing has made detecting CNV by sequencing possible. Traditional CNV determination methods involve expensive experiments, which is costly. Recently, several algorithms were proposed the detection of CNV, but none of them could utilize only the coverage information at a specific gene to make the determination of CNV at the gene.Methods: We collected 1132 samples with 51 mesenchymal-epithelial transition factor (MET) CNV samples and 1081 samples with no CNVs. We split the exons of MET to multiple 50-bp windows with the stride of 40-bp. At each window the supporting reads were counted, and reads numbers at each exons were piled up, generating a matrix. The training set containing 38 CNV-positive matrices and 38 CNV-negative matrices were extracted from all the matrices, leaving the others as the test set. The deep learning network, convolutional neural network (CNN) were applied to distinguish between the CNV-positive matrices and the CNV-negative matrices. For comparison, logistic regression model was also applied for the classification task.Results: The training set were split into 5 pieces containing 15 samples each and 5-fold cross validation were run for the CNN or the logistic regression model. After running 10 5-fold cross validation, we got an accuracy of 85.73 ± 9.24% for CNN and the average accuracy of 87.07 ± 8.99% for the logistic regression model. We chose 3 best models in both CNN and logistic regression models for the determination of the test set. The CNN models gained the accuracy of 71.84 ± 2.19% while the logistic regression models gained the accuracy of 69.70 ± 2.56%. Furthermore, the Area Under Curve (AUC) of the CNN models were 0.6335 ± 0.0124, higher than that of the logistic regression models which was 0.5666 ± 0.0115.Conclusions: We proposed a method to determine the samples with CNV by using the convolutional neural network with only the coverage information within the specific region. It provides the possibility of CNV detection for cancer patients using only the next generation sequencing data.Citation Format: Jidong Lang, Geng Tian. Deep learning method for the classification of CNV based on the next generation target sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5102.

DL-CNV: A Deep Learning Method for Identifying Copy Number Variations Based on Next Generation Target Sequencing

Accuracy Of Cnv Detection From Gwas Data

SeqCNV: a Novel Method for Identification of Copy Number Variations in Targeted Next-Generation Sequencing Data

CNVbd: A Method for Copy Number Variation Detection and Boundary Search

DeepCNV: a deep learning approach for authenticating copy number variations

Abstract 5102: Deep Learning Method for the Classification of CNV Based on the Next Generation Target Sequencing

CNV-P: a machine-learning framework for predicting high confident copy number variations

Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data

Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome

SCCNV: A Software Tool for Identifying Copy Number Variation From Single-Cell Whole-Genome Sequencing

A Remark on Copy Number Variation Detection Methods

Copy Number Variation Detection In Whole-Genome Sequencing Data Using The Bayesian Information Criterion

Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation

CNVABNN: An AdaBoost algorithm and neural networks-based detection of copy number variations from NGS data

PEcnv: accurate and efficient detection of copy number variations of various lengths

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Copy number variation analysis based on AluScan sequences

Copy Number Analysis Of Whole-Genome Data Using Bic-Seq2 And Its Application To Detection Of Cancer Susceptibility Variants

A novel signal processing approach for the detection of copy number variations in the human genome

CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data

nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data