Abstract:Background: Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-series deep learning models with longitudinal health checkup records to predict the onset of NAFLD in the future, and update the model stepwise by incorporating new checkup records to achieve dynamic prediction. Methods: 10,493 participants with over 6 health checkup records from Beijing MJ Health Screening Center were included to conduct a retrospective cohort study, in which the constantly updated initial 5 checkup data were incorporated stepwise to predict the risk of NAFLD at and after their sixth health checkups. A total of 33 variables were considered, consisting of demographic characteristics, medical history, lifestyle, physical examinations, and laboratory tests. L1-penalized logistic regression (LR) was used for feature selection. The long short-term memory (LSTM) algorithm was introduced for model development, and five-fold cross-validation was conducted to tune and choose optimal hyperparameters. Both internal validation and external validation were conducted, using the 20% randomly divided holdout test dataset and previously unseen data from Shanghai MJ Health Screening Center, respectively, to evaluate model performance. The evaluation metrics included area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, Brier score, and decision curve. Bootstrap sampling was implemented to generate 95% confidence intervals of all the metrics. Finally, the Shapley additive explanations (SHAP) algorithm was applied in the holdout test dataset for model interpretability to obtain time-specific and sample-specific contributions of each feature. Results: Among the 10,493 participants, 1662 (15.84%) were diagnosed with NAFLD at and after their sixth health checkups. The predictive performance of the deep learning model in the internal validation dataset improved over the incorporation of the checkups, with AUROC increasing from 0.729 (95% CI: 0.698,0.760) at baseline to 0.818 (95% CI: 0.798,0.844) when consecutive 5 checkups were included. The external validation dataset, containing 1728 participants, was used to verify the results, in which AUROC increased from 0.700 (95% CI: 0.657,0.740) with only the first checkups to 0.792 (95% CI: 0.758,0.825) with all five. The results of feature significance showed that body fat percentage, alanine transaminase (ALT), and uric acid owned the greatest impact on the outcome, time-specific, individual-specific and dynamic feature contributions were also produced for model interpretability. Conclusion: A dynamic prediction model was successfully established in our study, and the prediction capability kept improving with the renewal of the latest checkup records. In addition, we identified key features associated with the onset of NAFLD, making it possible to optimize the prevention and control strategies of the disease in the general population.

Machine learning-based mortality prediction models for non-alcoholic fatty liver disease in the general United States population

The NAFL Risk Score: A Simple Scoring Model to Predict 4-Y Risk for Non-Alcoholic Fatty Liver.

Application of Machine Learning Techniques for Clinical Predictive Modeling: A Cross-Sectional Study on Nonalcoholic Fatty Liver Disease in China

Development and validation of machine learning models for nonalcoholic fatty liver disease

Comparison and development of advanced machine learning tools to predict nonalcoholic fatty liver disease: An extended study

Use of Machine Learning to Predict Onset of NAFLD in an All-Comers Cohort-Development and Validation in 2 Large Asian Cohorts

Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms

Machine-Learning Algorithm for Predicting Fatty Liver Disease in a Taiwanese Population

Establishment of a machine learning predictive model for non-alcoholic fatty liver disease: a longitudinal cohort study

A Novel Model for Predicting Fatty Liver Disease by Means of an Artificial Neural Network

Advancing non-alcoholic fatty liver disease prediction: a comprehensive machine learning approach integrating SHAP interpretability and multi-cohort validation

A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study

Development, Validation, and Evaluation of a Simple Machine Learning Model to Predict Cirrhosis Mortality

Non‐invasive Fibrosis Markers Are Associated with Mortality Risk in Both General Populations and Non‐alcoholic Fatty Liver Disease Patients

Machine Learning to Predict Progression of Non‐alcoholic Fatty Liver to Non‐alcoholic Steatohepatitis or Fibrosis

Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020

Machine learning models predict liver steatosis but not liver fibrosis in a prospective cohort study

Screening New Blood Indicators for Non-alcoholic Fatty Liver Disease (NAFLD) Diagnosis of Chinese Based on Machine Learning

Machine Learning Approach for Cardiovascular Death Prediction among Nonalcoholic Steatohepatitis (NASH) Liver Transplant Recipients

Phenotypes of non-alcoholic fatty liver disease (NAFLD) and all-cause mortality: unsupervised machine learning analysis of NHANES III

Fatty liver index as an independent predictor of all-cause and disease-specific mortality