Improving read alignment through the generation of alternative reference via iterative strategy

Lina Bu,Qi Wang,Wenjin Gu,Ruifei Yang,Di Zhu,Zhuo Song,Xiaojun Liu,Yiqiang Zhao
DOI: https://doi.org/10.1038/s41598-020-74526-7
2020-10-30
Abstract:There is generally one standard reference sequence for each species. When extensive variations exist in other breeds of the species, it can lead to ambiguous alignment and inaccurate variant calling and, in turn, compromise the accuracy of downstream analysis. Here, with the help of the FPGA hardware platform, we present a method that generates an alternative reference via an iterative strategy to improve the read alignment for breeds that are genetically distant to the reference breed. Compared to the published reference genomes, by using the alternative reference sequences we built, the mapping rates of Chinese indigenous pigs and chickens were improved by 0.61-1.68% and 0.09-0.45%, respectively. These sequences also enable researchers to recover highly variable regions that could be missed using public reference sequences. We also determined that the optimal number of iterations needed to generate alternative reference sequences were seven and five for pigs and chickens, respectively. Our results show that, for genetically distant breeds, generating an alternative reference sequence can facilitate read alignment and variant calling and improve the accuracy of downstream analyses.
What problem does this paper attempt to address?