Benchmarking for genotyping and imputation using degraded DNA for forensic applications across diverse populations

Elena I Zavala,Rori V. Rohlfs,Priya Moorjani
DOI: https://doi.org/10.1101/2024.07.02.601808
2024-07-03
Abstract:Advancements in sequencing and laboratory technologies have enabled forensic genetic analysis on increasingly low quality and degraded DNA samples. However, existing computational methods applied to genotyping and imputation for generating DNA profiles from degraded DNA have not been tested for forensic applications. Here we simulated sequencing data of varying qualities - coverage, fragment lengths, and deamination patterns - from forty individuals of diverse genetic ancestries. We used this dataset to test the performance of commonly used genotype and imputation methods (SAMtools, GATK, ATLAS, Beagle, and GLIMPSE) on five different SNP panels (MPS-plex, FORCE, two extended kinship panels, and the Human Origins array) that are used for forensic and population genetics applications. For genome mapping and variant calling with degraded DNA, we find use of parameters and methods (such as ATLAS) developed for ancient DNA analysis provides a marked improvement over conventional standards used for next generation sequencing analysis. We find that ATLAS outperforms GATK and SAMtools, achieving over 90% genotyping accuracy for the four largest SNP panels with coverages greater than 10X. For lower coverages, decreased concordance rates are correlated with increased rates of heterozygosity. Genotype refinement and imputation improve the accuracy at lower coverages by leveraging population reference data. For all five SNP panels, we find that using a population reference panel representative of worldwide populations (e.g., the 1000 Genomes Project) results in increased genotype accuracies across genetic ancestries, compared to ancestry-matched population reference panels. Importantly, we find that the low SNP density of commonly used forensics SNP panels can impact the reliability and performance of genotype refinement and imputation. This highlights a critical trade-off between enhancing privacy by using panels with fewer SNPs and maintaining the effectiveness of genomic tools. We provide benchmarks and recommendations for analyzing degraded DNA from diverse populations with widely used genomic methods in forensic casework.
Genetics
What problem does this paper attempt to address?