Selecting, optimizing and externally validating a preexisting machine-learning regression algorithm for estimating waist circumference

Bryan V. Phillips-Farfán
DOI: https://doi.org/10.1016/j.compbiomed.2023.107909
IF: 7.7
2024-01-06
Computers in Biology and Medicine
Abstract:Obesity, typically defined by the body mass index (BMI), has well known negative health effects. However, the BMI has serious deficiencies in predicting the adverse risks associated to obesity. Waist circumference (WC) is an alternative to define obesity and a better disease predictor according to the literature. However, old databases often lack this information, it is inaccurate (collected via self-report) or it is incomplete. Thus, this study accurately assesses WC using machine learning . The novel approaches are: 1) predictor variables (weight, height, age and sex) likely to appear in most data sets are used. 2) Publicly available data (including non-adults) and algorithms are used. 3) Systematic methods for data cleanup, model selection, hyperparameter optimization and external validation are performed. Data are cleaned one variable per column, no special codes, missing values or outliers. Preexisting regression algorithms are gaged by cross-validation, using one data set. The hyperparameters of the best performing algorithm are optimized. The tuned algorithm is externally validated with other data sets by cross-validation. In spite of the limited number of features, the tuned algorithm outperforms prior WC approximations, using the same or similar predictor variables. The tuned algorithm enables using data where WC is not measured, is incomplete or is unreliable. A similar approach would be useful to estimate other variables of interest.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?