Development of the Decision Tree Model for Distinguishing Individuals of Chinese Four Surnames from Zhanjiang Han Population Based on Y-STR Haplotypes.

Xiao-Ye Jin,Ya-Ting Fang,Wei Cui,Chong Chen,Yu-Xin Guo,Hao-Tian Meng,Hong-Dan Wang,Kai Zhao,Bo-Feng Zhu
DOI: https://doi.org/10.1016/j.legalmed.2021.101848
IF: 2.017
2021-01-01
Legal Medicine
Abstract:Co-separation studies between surnames and Y chromosome genetic markers are beneficial to revealing population migrations, surname origins, population formation histories and forensic familial searching. Genetic distributions of 27 Y-STRs in Chinese four surnames (Li, Lin, Chen and Huang) from Zhanjiang Han population were investigated. Meanwhile, we tried to develop a decision tree model for surname predictions based on Y-STR haplotypes. Allelic frequencies of 27 Y-STRs showed that unique alleles were only observed in a certain surname; besides, some alleles displayed higher frequencies in a certain surname than those in other surnames, implying these alleles might be employed as the useful indicators for surname predictions. Haplotype match probability values of 27 Y-STRs in these surnames revealed that the system could be used as a valuable tool for forensic male identification. The developed decision tree model performed well for the training set with the accuracy of 0.9860 and obtained the relatively high accuracy (>0.70) for surname predictions of the testing set. To sum up, we explored the power of the machine learning to the surname predictions based on obtained Y-STR haplotypes, which showed promising application values in forensic familial searching.
What problem does this paper attempt to address?