Abstract:Improving the resolution of the current widely used Y-chromosomal short tandem repeat (Y-STR) dataset is of great importance for forensic investigators, and the current approach is limited, except for the addition of more Y-STR loci. In this research, a regional Y-DNA database was investigated to improve the Y-STR haplotype resolution utilizing a Y-SNP Pedigree Tagging System that includes 24 Y-chromosomal single nucleotide polymorphism (Y-SNP) loci. This pilot study was conducted in the Chinese Yunnan Zhaoyang Han population, and 3473 unrelated male individuals were enrolled. Based on data on the male haplogroups under different panels, the matched or near-matching (NM) Y-STR haplotype pairs from different haplogroups indicated the critical roles of haplogroups in improving the regional Y-STR haplotype resolution. A classic median-joining network analysis was performed using Y-STR or Y-STR/Y-SNP data to reconstruct population substructures, which revealed the ability of Y-SNPs to correct misclassifications from Y-STRs. Additionally, population substructures were reconstructed using multiple unsupervised or supervised dimensionality reduction methods, which indicated the potential of Y-STR haplotypes in predicting Y-SNP haplogroups. Haplogroup prediction models were built based on nine publicly accessible machine-learning (ML) approaches. The results showed that the best prediction accuracy score could reach 99.71% for major haplogroups and 98.54% for detailed haplogroups. Potential influences on prediction accuracy were assessed by adjusting the Y-STR locus numbers, selecting Y-STR loci with various mutabilities, and performing data processing. ML-based predictors generally presented a better prediction accuracy than two available predictors (Nevgen and EA-YPredictor). Three tree models were developed based on the Yfiler Plus panel with unprocessed input data, which showed their strong generalization ability in classifying various Chinese Han subgroups (validation dataset). In conclusion, this study revealed the significance and application prospects of Y-SNP haplogroups in improving regional Y-STR databases. Y-SNP haplogroups can be used to discriminate NM Y-STR haplotype pairs, and it is important for forensic Y-STR databases to develop haplogroup prediction tools to improve the accuracy of biogeographic ancestry inferences.

Predicting Haplogroups Using a Versatile Machine Learning Program (predymale) on a New Mutationally Balanced 32 Y-STR Multiplex (combyplex): Unlocking the Full Potential of the Human STR Mutation Rate Spectrum to Estimate Forensic Parameters.

Convergence of Y Chromosome STR Haplotypes from Different SNP Haplogroups Compromises Accuracy of Haplogroup Prediction

Improving the Regional Y-STR Haplotype Resolution Utilizing Haplogroup-Determining Y-SNPs and the Application of Machine Learning in Y-SNP Haplogroup Prediction in a Forensic Y-STR Database: A Pilot Study on Male Chinese Yunnan Zhaoyang Han Population

Validation of A Novel Y-Snps Multiplex System for Forensic Application

Large-scale pedigree analysis highlights rapidly mutating Y-chromosomal short tandem repeats for differentiating patrilineal relatives and predicting their degrees of consanguinity

Polymerization of 1-Phosphaisoprene: Synthesis and Characterization of a Chemically Functional Phosphorus Version of Natural Rubber.

The Validation of a Single Multiplex Typing System With 45 Y-STR Markers for Familial Searching and Database Construction

Sequence polymorphisms of forensic Y-STRs revealed by a 68-plex in-house massively parallel sequencing panel

The construction and application of a new 17-plex Y-STR system using universal fluorescent PCR

Male Pedigree Toolbox: A Versatile Software for Y-STR Data Analyses

MPKin‐YSTR: Interpretation of Y chromosome STR haplotypes for missing persons cases

Genetic Reconstruction and Forensic Analysis of Chinese Shandong and Yunnan Han Populations by Co-Analyzing Y Chromosomal STRs and SNPs

Title: Developmental validation of Y-SNP pedigree tagging system: A panel via quick ARMS PCR

RETRACTED: Concordance and characterization of massively parallel sequencing at 58 STRs in a Tibetan population (Retracted Article)

Mutation analysis for newly suggested 30 Y-STR loci with high mutation rates in Chinese father-son pairs

XGBoost as a reliable machine learning tool for predicting ancestry using autosomal STR profiles - Proof of Method

Statistical methods for discrimination of STR genotypes using high resolution melt curve data

Formal verification of information derivability in databases using precedence analyses

Characterization of sequence variations in the extended flanking regions using massively parallel sequencing in 21 A-STRs and 21 Y-STRs

Harmonizing the forensic nomenclature for STR loci D6S474 and DYS612

Transient appearance of classic choroidal neovascularization after transpupillary thermotherapy for occult choroidal neovascularization.