DNA methylation clocks struggle to distinguish inflammaging from healthy aging, but feature rectification improves coherence and enhances detection of inflammaging

Colin M Skinner,Michael J Conboy,Irina M Conboy
DOI: https://doi.org/10.1101/2024.10.09.617512
2024-10-13
Abstract:Biological age estimation from DNA methylation and determination of relevant biomarkers is an active research problem which has predominantly been tackled with black-box penalized regression. Machine learning is used to select a small subset of features from hundreds of thousands CpG probes and to increase generalizability typically lacking with ordinary least-squares regression. Here, we show that such feature selection lacks biological interpretability and relevance in the clocks of the first- and next-generations, and clarify the logic by which these clocks systematically exclude biomarkers of aging and disease. Moreover, in contrast to the assumption that regularized linear regression is needed to prevent overfitting, we demonstrate that hypothesis-driven selection of biologically relevant features in conjunction with ordinary least squares regression yields accurate, well-calibrated, generalizable clocks with high interpretability. We further demonstrate that the interplay of disease-related shifts of predictor values and their corresponding weights, which we term feature shifts, contributes to the lack of resolution between health and disease in conventional linear models. Lastly, we introduce a method of feature rectification, which aligns these shifts to improve the distinction of age predictions for healthy people vs. patients with various diseases.
Bioinformatics
What problem does this paper attempt to address?