Improved age estimation from semen using sperm-specific age-related CpG markers
Chao Xiao,Ya Li,Maomin Chen,Shaohua Yi,Daixin Huang
DOI: https://doi.org/10.1016/j.fsigen.2023.102941
Abstract:Accurate age estimation from semen has the potential to greatly narrow the pool of unidentified suspects in sexual assault investigations. However, previous efforts utilizing semen age-related CpG (AR-CpG) markers have shown lower accuracy compared to blood AR-CpG-based methods. This discrepancy may be attributed to DNA methylation (DNAm) interferences from "round cells" such as leukocytes and immature sperm cells in semen. This study aimed to develop age calculators based on sperm-specific AR-CpG markers and to achieve performance-improved age estimates from sperm DNA. Through an analysis of publicly available MethylationEPIC microarray data from 90 sperm samples of healthy males aged 22-51 years, we identified 31 sperm-specific AR-CpG markers with absolute Pearson's R values > 0.5 and Benjamini-Hochberg adjusted p values < 0.013. The top 19 AR-CpG markers with the largest absolute R values and beta ranges > 0.10, along with 3 reported semen AR-CpG markers (cg06304190, cg06979108, and cg12837463), were integrated into two methylation SNaPshot panels (Ⅰ and Ⅱ), each containing 11 markers. The 21 qualified AR-CpG markers showed absolute R values ≥ 0.427 in an independent validation cohort of 253 sperm DNA samples (22-67 years), with cg21843517 exhibiting the strongest age correlation (R = 0.853). The optimal models, constructed using sperm DNAm data of the training set (n = 214, 22-67 years) and markers from panel Ⅰ (n = 11), panel Ⅱ (n = 10), or both panels, achieved mean absolute errors (MAEs) of 2.526-4.746, 3.890-5.715, and > 9.800 years on the test sets of sperm (n = 39, 23-64 years), semen (same donors as the sperm test set), and whole blood (n = 40, 22-65 years), respectively. The simplified models incorporating 3, 5, 9, or 14 AR-CpG markers (MAE = 2.918-4.139 years for sperm) still outperformed the Lee et al. original model (MAE = 6.444 years for semen) and the reconstructed panel Lee model (MAE = 6.011 years for sperm). The final models, utilizing all sperm DNAm data (n = 253) and markers from panel Ⅰ, panel Ⅱ, or both panels, yielded mean MAEs of 2.587, 2.766, and 2.200 years, respectively, on the 50 test sets generated by 5 repeats of 10-fold cross-validations. Additionally, multiple markers in both panels demonstrated the ability to discern sperm or semen from blood with 100% accuracy. In summary, our study substantiates the potential of sperm-specific AR-CpG markers for precise age estimation from sperm DNA, providing an improved toolset for forensic investigations.