Study on detection of CNVs using human whole genome bisulfite sequencing data
Dan-Tong Xu,Yi-Fei Wang,Jia-Li Cai,Wen-Tao Gong,Xiang-Chun Pan,Yu-Han Tian,Qing-Peng Shen,Jia-Qi Li,Xiao-Long Yuan
DOI: https://doi.org/10.16288/j.yczz.22-385
2023-04-20
Abstract:It has been reported that the aberrant DNA methylation may result in copy number variations (CNVs), and the CNVs may alter the levels of DNA methylation. Whole genome bisulfite sequencing (WGBS) is able to generate the sequencing data of DNAs, and shows the potential ability to detect CNVs. However, the evaluations and performances on the detections of CNVs using WGBS data is still unclear. In this study, five software with different strategies for CNV detections, e.g., BreakDancer, cn.mops, CNVnator, DELLY and Pindel, were selected to explore and benchmark the performances of CNV detections with WGBS data. Based on the real (2.62 billion reads) and simulated (12.35 billion reads) WGBS data of humans, we calculated the number, precision, recall, relative ability, memory usage, and running time of CNV detections by 150 times, and tried to figure out the optimal strategy for CNV detections with WGBS data. Based on the real WGBS data, Pindel detected the most deletions and duplications, CNVnator detected the deletions with the highest precision, cn.mops detected the duplications with the highest precision, Pindel detected the deletions with the highest recall, and cn.mops detected the duplications with the highest recall. Based on the simulated WGBS data, BreakDancer detected the most deletions, and cn.mops detected the most duplications. The CNVnator showed the highest precision and recall for both deletions and duplications. In real and simulated WGBS data, the ability of CNVnator to detect CNVs was likely to overtake that in the whole genome sequencing data. Additionally, DELLY and BreakDancer displayed the lowest peak of memory usage and the lest CPU runtime, while CNVnator expressed the highest peak of memory usage and the most CPU runtime. Taken together, CNVnator and cn.mops showed the excellent performances of CNV detections with WGBS data. These results suggested that it was feasible to detect CNVs using WGBS data, and provided the useful information to further investigate both CNVs and DNA methylation using WGBS data alone.