Abstract:Objectives We adopted the machine-learning algorithms and deep-learning sequential model to determine and optimize most important factors for overweight and obesity in Chinese preschool-aged children. Methods This is a cross-sectional survey conducted in 2020 at Beijing and Tangshan. Using a stratified cluster random sampling strategy, children aged 3-6 years were enrolled. Data were analyzed using the PyCharm and Python. Results A total of 9478 children were eligible for inclusion, including 1250 children with overweight or obesity. All children were randomly divided into the training group and testing group at a 6:4 ratio. After comparison, support vector machine (SVM) outperformed the other algorithms (accuracy: 0.9457), followed by gradient boosting machine (GBM) (accuracy: 0.9454). As reflected by other 4 performance indexes, GBM had the highest F1 score (0.7748), followed by SVM with F1 score at 0.7731. After importance ranking, the top 5 factors seemed sufficient to obtain descent performance under GBM algorithm, including age, eating speed, number of relatives with obesity, sweet drinking, and paternal education. The performance of the top 5 factors was reinforced by the deep-learning sequential model. Conclusions We have identified 5 important factors that can be fed to GBM algorithm to better differentiate children with overweight or obesity from the general children, with decent prediction performance.

Predicting risk of overweight or obesity in Chinese preschool-aged children using artificial intelligence techniques