Integrated Detection Of Copy Number Variation Based On The Assembly Of Ngs And 3gs Data

Feng Gao,Liwei Gao,Jing-Yang Gao
DOI: https://doi.org/10.1007/978-3-030-17938-0_23
2019-01-01
Abstract:The genomic coverage of copy number variations (CNVs) ranges from 5% to 10%, which is one of the essential pathogenic factors of human diseases. The detection of large CNVs is still defective. However, the read length of the third-generation sequencing (3GS) data is longer than that of the next-generation sequencing (NGS) data, which can theoretically solve the defect that the long variation can't be detected. However, due to the low accuracy of the 3GS data, it is difficult to apply in practice. To a large extent, it is a supplement to the NGS data research. To solve these problems, we developed a new mutation detection tool named AssCNV23 in this paper. Firstly, this tool corrects the 3GS data to solve the problem of high error rate, and then combines the results of a variety of mutation detection tools to improve the accuracy of the initial mutation set and to solve the detection bias of a single detection tool. At the same time, the high-quality 3GS data was introduced by AssCNV23 to guide the NGS data to assemble, and then detects the CNV after getting enough length data. Finally, to improve the detection efficiency, the tool generates images containing the sequence depth information based on the read depth strategy and uses the convolutional neural network to detect the existing CNVs. The experimental results show that AssCNV23 guarantees a high level of breakpoint accuracy and performs well in identifying large variation. Compared with other tools, the deep learning model has advantages in accuracy and sensitivity, and Matthew correlation coefficient (MCC) performs well in various experiments. This algorithm is relatively reliable.
What problem does this paper attempt to address?