Abstract 5102: Deep Learning Method for the Classification of CNV Based on the Next Generation Target Sequencing

Jidong Lang,Geng Tian
DOI: https://doi.org/10.1158/1538-7445.am2019-5102
IF: 11.2
2019-01-01
Cancer Research
Abstract:Background: Copy number variations (CNVs) are of great importance to many cancers. Recently, the next-generation sequencing has made detecting CNV by sequencing possible. Traditional CNV determination methods involve expensive experiments, which is costly. Recently, several algorithms were proposed the detection of CNV, but none of them could utilize only the coverage information at a specific gene to make the determination of CNV at the gene.Methods: We collected 1132 samples with 51 mesenchymal-epithelial transition factor (MET) CNV samples and 1081 samples with no CNVs. We split the exons of MET to multiple 50-bp windows with the stride of 40-bp. At each window the supporting reads were counted, and reads numbers at each exons were piled up, generating a matrix. The training set containing 38 CNV-positive matrices and 38 CNV-negative matrices were extracted from all the matrices, leaving the others as the test set. The deep learning network, convolutional neural network (CNN) were applied to distinguish between the CNV-positive matrices and the CNV-negative matrices. For comparison, logistic regression model was also applied for the classification task.Results: The training set were split into 5 pieces containing 15 samples each and 5-fold cross validation were run for the CNN or the logistic regression model. After running 10 5-fold cross validation, we got an accuracy of 85.73 ± 9.24% for CNN and the average accuracy of 87.07 ± 8.99% for the logistic regression model. We chose 3 best models in both CNN and logistic regression models for the determination of the test set. The CNN models gained the accuracy of 71.84 ± 2.19% while the logistic regression models gained the accuracy of 69.70 ± 2.56%. Furthermore, the Area Under Curve (AUC) of the CNN models were 0.6335 ± 0.0124, higher than that of the logistic regression models which was 0.5666 ± 0.0115.Conclusions: We proposed a method to determine the samples with CNV by using the convolutional neural network with only the coverage information within the specific region. It provides the possibility of CNV detection for cancer patients using only the next generation sequencing data.Citation Format: Jidong Lang, Geng Tian. Deep learning method for the classification of CNV based on the next generation target sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5102.
What problem does this paper attempt to address?