Relationship Inference with Low-Coverage Whole Genome Sequencing on Forensic Samples

V.P. Nagraj,Matthew Scholz,Shakeel Jessa,Jianye Ge,Meng Huang,August E. Woerner,Dixie Peters,Bruce Budowle,Michael D. Coble,Stephen D. Turner
DOI: https://doi.org/10.1089/forensic.2022.0009
2022-01-01
Forensic Genomics
Abstract:Background: Single nucleotide polymorphism (SNP)-based kinship analysis is now a cornerstone of modern forensic genomics. Imputation can be used to augment genome-wide SNP data from low-coverage whole-genome sequencing (LCWGS). The impact of imputation after LCWGS on genotyping error and its subsequent impact on kinship analysis are unknown. Methods: We assessed the impact of LCWGS+imputation on genotyping error using 1 × LCWGS and unidentified human remains paired with direct reference samples. We characterized genotyping error before and after implementing postimputation filters on quality and allele frequency. We used these empirically derived error rates to simulate genotyping data in large pedigrees where the true relationships were known. We assess the impact of LCWGS+imputation on kinship analysis by evaluating the classification accuracy of methods representing two classes of SNP-based kinship methods. Results: Postimputation filtering on posterior genotype probabilities and minor allele frequency in reference populations result in notable improvements in genotyping accuracy. These improvements result in increased accuracy of tools for SNP-based kinship analysis. When using identity-by-descent (IBD) segment detection to determine relatedness, counting smaller IBD segments can be used to increase accuracy when error is high without notably increasing false positive relationship inferences when genotyping error is minimal. Conclusions: This study demonstrates that imputation and postimputation filtering can improve the accuracy of methods to infer relationships between samples where sequencing was performed at low coverage, whether by design to increase throughput, or by necessity given degraded, low input, or contaminated samples. LCWGS+imputation represents a viable path forward for relationship inference in forensic genomics.
What problem does this paper attempt to address?