CmVCall: An automated and adjustable nanopore analysis pipeline for heteroplasmy detection of the control region in human mitochondrial genome

Lirong Jiang,Jing Liu,Suyu Li,Yufeng Wen,Xinyue Zheng,Liu Qin,Yiping Hou,Zheng Wang
DOI: https://doi.org/10.1016/j.fsigen.2023.102930
Abstract:Genetic associations between human mitochondrial DNA (mtDNA) heteroplasmy and mitochondrial diseases, aging, and cancer have been elaborated, contributing a lot to the further understanding of mtDNA polymorphic spectrum in anthropology, population, and forensic genetics. In the past decade, heteroplasmy detection using Sanger sequencing and next generation sequencing (NGS) was hampered by the former's inefficiency and the latter's inherent bias due to amplification and mapping of short reads, respectively. Nanopore sequencing stands out for its ability to yield long contiguous segments of DNA, providing a new insight into heterogeneity authentication. In addition to MinION from Oxford Nanopore Technologies, an alternative nanopore sequencer QNome (Qitan Technology) has also been applied to various biological research and the forensic applicability of this platform has been proved recently. In this study, we evaluated the performance of four commonly used variant callers in the heterogeneity authentication of the control region of human mtDNA based on simulations of different ratios generated by mixing QNome nanopore sequencing reads of two synthetic sequences. Then, an open-source and python-based nanopore analytics pipeline, CmVCall was developed and incorporated multiple programs including reads filtering, removal of nuclear mitochondrial sequences (NUMTs), alignment, optional 'Correction' mode, and heterogeneity identification. CmVCall can achieve high precision, accuracy, and recall of 100%, 99.9%, and 92.3% with a 5% heteroplasmy level in 'Correction' mode. Moreover, blood, saliva, and hair shaft samples from monozygotic (MZ) twins were used for heterogeneity evaluation and comparison with the NGS data. Results of MZ twin samples showed that CmVCall could identify more point heteroplasmy sites, revealing significant levels of inter- and intra-individual mtDNA polymorphism. In conclusion, we believe that this analysis pipeline will lay a solid foundation for the development of a comprehensive nanopore analysis pipeline targeting the whole mitochondrial genome.
What problem does this paper attempt to address?