Optimized variant calling for estimating kinship
Sammed Mandape,Kapema Bupe Kapema,Amy Smuts,Benjamin Crysup,Xuewen Wang,Meng Huang,Jianye Ge,Bruce Budowle,August E. Woerner,Tiffany M. Duque,Jonathan L. King
DOI: https://doi.org/10.1016/j.fsigen.2022.102785
2022-11-01
Abstract:One of the fundamental goals of forensic genetics is sample attribution, i.e., whether an item of evidence can be associated with some person or persons. The most common scenario involves a direct comparison, e.g., between DNA profiles from an evidentiary item and a sample collected from a person of interest. Less common is an indirect comparison in which kinship is used to potentially identify the source of the evidence. Because of the sheer amount of information lost in the hereditary process for comparison purposes, sampling a limited set of loci may not provide enough resolution to accurately resolve a relationship. Instead, whole genome techniques can sample the entirety of the genome or a sufficiently large portion of the genome and as such they may effect better relationship determinations. While relatively common in other areas of study, whole genome techniques have only begun to be explored in the forensic sciences. As such, bioinformatic pipelines are introduced for estimating kinship by massively parallel sequencing of whole genomes using approaches adapted from the medical and population genomic literature. The pipelines are designed to characterize a person's entire genome, not just some set of targeted markers. Two different variant callers are considered, contrasting a classical variant calling algorithm (BCFtools) to a more modern deep convolution neural network (DeepVariant). Two different bioinformatic pipelines specific to each variant caller are introduced and evaluated in a titration series. Filters and thresholds are then optimized specifically for the purposes of estimating kinship as determined by the KING-robust algorithm. With the appropriate filtering and thresholds in place both tools perform similarly, with DeepVariant tending to produce more accurate genotypes, though the resultant types of inaccuracies tended to produce slightly less accurate overall estimates of relatedness.
genetics & heredity,medicine, legal