A Structural Variation Genotyping Algorithm Enhanced by CNV Quantitative Transfer

Zheng, Tian,Qian, Xinyang,Wang, Jiayin
DOI: https://doi.org/10.1007/s11704-021-1177-z
IF: 2.6688
2022-01-01
Frontiers of Computer Science
Abstract:Genotyping of structural variations considering copy number variations (CNVs) is an infancy and challenging problem. CNVs, a prevalent form of critical genetic variations that cause abnormal copy numbers of large genomic regions in cells, often affect transcription and contribute to a variety of diseases. The characteristics of CNVs often lead to the ambiguity and confusion of existing genotyping features and algorithms, which may cause heterozygous variations to be erroneously genotyped as homozygous variations and seriously affect the accuracy of downstream analysis. As the allelic copy number increases, the error rate of genotyping increases sharply. Some instances with different copy numbers play an auxiliary role in the genotyping classification problem, but some will seriously interfere with the accuracy of the model. Motivated by these, we propose a transfer learning-based method to genotype structural variations accurately considering CNVs. The method first divides the instances with different allelic copy numbers and trains the basic machine learning framework with different genotype datasets. It maximizes the weights of the instances that contribute to classification and minimizes the weights of the instances that hinder correct genotyping. By adjusting the weights of the instances with different allelic copy numbers, the contribution of all the instances to genotyping can be maximized, and the genotyping errors of heterozygote variations caused by CNVs can be minimized. We applied the proposed method to both the simulated and real datasets, and compared it to some popular algorithms including GATK, Facets and Gindel. The experimental results demonstrate that the proposed method outperforms the others in terms of accuracy, stability and efficiency. The source codes have been uploaded at github/TrinaZ/CNVtransfer for academic use only.
What problem does this paper attempt to address?