Rapid and accurate genotype imputation from low coverage short read, long read, and cell free DNA sequence

Zilong Li,Anders Albrechtsen,Robert William Davies
DOI: https://doi.org/10.1101/2024.07.18.604149
2024-08-18
Abstract:Inexpensive and accurate genotyping methods are essential to modern genomics and health risk prediction. Here we introduce QUILT2, a scalable read-aware imputation method that can efficiently use biobank scale haplotype reference panels. This allows for fast and accurate imputation using short reads, as well as long reads (e.g. ONT 1X r2 = 0.937 at common SNPs), linked-reads and ancient DNA. In addition, QUILT2 contains a methodological innovation that enables imputation of the maternal and fetal genome using cell free non-invasive prenatal testing (NIPT) data. Using a UK Biobank reference panel, we see accurate imputation of both mother (r2 = 0.966) and fetus (r2 = 0.465) at 0.25X (fetal fraction of 10%, common SNPs). Imputation gets increasingly accurate as coverage increases, with r2 of around 0.90 or above for both mother and fetus at 4.0X (mother r2 = 0.996, fetal r2 = 0.894). We show that this imputation enables powerful GWAS and accurate PRS for both mother and fetus, which creates both clinical opportunities, and if phenotypes can be collected alongside clinical NIPT, the potential for large GWAS.
Bioinformatics
What problem does this paper attempt to address?