The limitations of kinship determinations using STR data in ill-defined populations
Vincent Zvénigorosky,Audrey Sabbagh,Angéla Gonzalez,Jean-Luc Fausser,Friso Palstra,Georgii Romanov,Aisen Solovyev,Nikolay Barashkov,Sardana Fedorova,Éric Crubézy,Bertrand Ludes,Christine Keyser
DOI: https://doi.org/10.1007/s00414-020-02298-w
Abstract:The likelihood ratio (LR) method is commonly used to determine kinship in civil, criminal, or forensic cases. For the past 15 years, our research group has also applied LR to ancient STR data and obtained kinship results for collections of graves or necropolises. Although we were able to reconstruct large genealogies, some pairs of individuals showed ambiguous results. Second-degree relationships, half-sibling pairs for example, were often inconsistent with detected first-degree relationships, such as parent/child or brother/sister pairs. We therefore set about providing empirical estimations of the error rates for the LR method in living populations with STR allelic diversities comparable to that of the ancient populations we had previously studied. We collected biological samples in the field in North-Eastern Siberia and West Africa and studied more than 800 pairs of STR profiles from individuals with known relationships. Because commercial STR panels were constructed for specific regions (namely Europe and North America), their allelic makeup showed a significant deficit in diversity when compared to European populations, replicating a situation often faced in ancient DNA studies. We assessed the capacity of the LR method to confirm known relationships (effectiveness) and its capacity to detect those relationships (reliability). Concerns over the effectiveness of LR determinations are mostly an issue in forensic studies, while the reliability of the detection of kinship is an issue for the study of necropolises or other large gatherings of unidentified individuals, such as disaster victims or mass graves. We show that the application of LR to both test populations highlights specific issues (both false positives and false negatives) that prevent the confirmation of second-degree kinship or even full siblingship in small populations. Up to 29% of detected full sibling relationships were either overestimated half-sibling relationships or underestimated parent-offspring relationships. The error rate for detected half-sibling relationships was even higher, reaching 41%. Only parent-offspring pairs were reliably detected or confirmed. This implies that, in populations that are small, ill-defined, or for which the STR loci analyzed are inappropriate, an examiner might not be able to distinguish a pair of full siblings from a pair of half-siblings. Furthermore, half-sibling pairs might be overlooked altogether, an issue that is exacerbated by the common confusion, in many languages and cultures, between half-siblings and full siblings. Consequently, in the study of ancient populations, human remains of unknown origins, or poorly surveyed modern populations, we recommend a conservative approach to kinship determined by LR. Next-generation sequencing data should be used when possible, but the costs and technology involved might be prohibitive. Therefore, in potentially contentious situations or cases lacking sufficient external information, uniparental markers should be analyzed: ideally, complete mitochondrial genomes and Y-chromosome haplotypes (STR, SNP, and/or sequencing).