Microsatellites' mutation modeling through the analysis of the Y-chromosomal transmission: Results of a GHEP-ISFG collaborative study
Sofia Antão-Sousa,Leonor Gusmão,Nidia M Modesti,Sofía Feliziani,Marisa Faustino,Valeria Marcucci,Claudia Sarapura,Julyana Ribeiro,Elizeu Carvalho,Vania Pereira,Carmen Tomas,Marian M de Pancorbo,Miriam Baeta,Rashed Alghafri,Reem Almheiri,Juan José Builes,Nair Gouveia,German Burgos,Maria de Lurdes Pontes,Adriana Ibarra,Claudia Vieira da Silva,Rukhsana Parveen,Marc Benitez,António Amorim,Nadia Pinto
DOI: https://doi.org/10.1016/j.fsigen.2023.102999
Abstract:The Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) organized a collaborative study on mutations of Y-chromosomal short tandem repeats (Y-STRs). New data from 2225 father-son duos and data from 44 previously published reports, corresponding to 25,729 duos, were collected and analyzed. Marker-specific mutation rates were estimated for 33 Y-STRs. Although highly dependent on the analyzed marker, mutations compatible with the gain or loss of a single repeat were 23.2 times more likely than those involving a greater number of repeats. Longer alleles (relatively to the modal one) showed to be nearly twice more mutable than the shorter ones. Within the subset of longer alleles, the loss of repeats showed to be nearly twice more likely than the gain. Conversely, shorter alleles showed a symmetrical trend, with repeat gains being twofold more frequent than reductions. A positive correlation between the paternal age and the mutation rate was observed, strengthening previous findings. The results of a machine learning approach, via logistic regression analyses, allowed the establishment of algebraic formulas for estimating the probability of mutation depending on paternal age and allele length for DYS389I, DYS393 and DYS627. Algebraic formulas could also be established considering only the allele length as predictor for DYS19, DYS389I, DYS389II-I, DYS390, DYS391, DYS393, DYS437, DYS439, DYS449, DYS456, DYS458, DYS460, DYS481, DYS518, DYS533, DYS576, DYS626 and DYS627 loci. For the remaining Y-STRs, a lack of statistical significance was observed, probably as a consequence of the small effective size of the subsets available, a common difficulty in the modeling of rare events as is the case of mutations. The amount of data used in the different analyses varied widely, depending on how the data were reported in the publications analyzed. This shows a regrettable waste of produced data, due to inadequate communication of the results, supporting an urgent need of publication guidelines for mutation studies.