A Novel Efficient Algorithm for Common Variants Genotyping from Low-Coverage Sequencing Data

Shixuan Zhang,Kanglong Xiao,Xinyi He,Meng Hao,Yanyun Ma,Yuxin Tian,Li Jin,Yi Li,Jiucun Wang,Yi Wang
DOI: https://doi.org/10.1101/2024.12.01.626280
2024-12-05
Abstract:Low-coverage whole-genome sequencing (LC-WGS) combined with imputation represents a cost-effective genotyping strategy for genome-wide association studies (GWAS) in population genetics. In this study, the Limpute algorithm was developed specifically for genotyping from low-coverage sequencing data, it extracts variant information from low-coverage sequencing data by the novel virtual probes and subsequently performs imputation through cross-reference between samples. Compared to the currently dominant algorithm for low-coverage sequencing data, GLIMPSE2, Limpute achieved similar imputation performance within common variants (r2>0.87) while the GLIMPSE2 has a runtime approximately five times longer than that of the Limpute. Furthermore, to fully evaluate the accuracy of genotype imputation by Limpute, we utilized high-coverage whole-genome sequencing data (30x), microarray data, and high-coverage whole-exome sequencing data (30x) as validation sets respectively. The results demonstrated that Limpute has a good imputation performance for common variants using low-coverage sequencing data (1x: r2 > 0.87; 3x: r2 > 0.92; 5x: r2 > 0.93). In summary, we present a highly efficient, low-cost algorithm for genotyping from low-coverage sequencing data, offering substantial support for genetic research.
Biology
What problem does this paper attempt to address?