A Machine Learning Model to Predict Risk for Hepatocellular Carcinoma in Patients With Metabolic Dysfunction-Associated Steatotic Liver Disease

Souvik Sarkar,Aniket Alurwar,Carole Ly,Cindy Piao,Rajiv Donde,Christopher J Wang,Frederick J Meyers
DOI: https://doi.org/10.1016/j.gastha.2024.01.007
2024-01-23
Abstract:Background and aims: Hepatocellular carcinoma (HCC) incidence is increasing and correlated with metabolic dysfunction-associated steatotic liver disease (MASLD; formerly nonalcoholic fatty liver disease), even in patients without advanced liver fibrosis who are more likely to be diagnosed with advanced disease stages and shorter survival time, and less likely to receive a liver transplant. Machine learning (ML) tools can characterize large datasets and help develop predictive models that can calculate individual HCC risk and guide selective screening and risk mitigation strategies. Methods: Tableau and KNIME Analytics were used for descriptive analytics and ML tasks. ML models were developed using standard laboratory and clinical parameters. Sci-kit learn algorithms were used for model development. Data from University of California (UC), Davis, were used to develop and train a pilot predictive model, which was subsequently validated in an independent dataset from UC San Francisco. MASLD and HCC patients were identified by International Classification of Diseases-9/10 codes. Results: Of the patients diagnosed with MASLD (n = 1561 training; n = 686 validation), HCC developed in 14% (n = 227) of the UC Davis training cohort and 25% (n = 176) of the UC San Francisco validation cohort. Liver fibrosis determined by the noninvasive Fibrosis-4 score was the strongest single predictor for HCC in the model. Using the validation cohort, the model predicted HCC development at 92.06% accuracy with an area under the curve of 0.97, F1-score of 0.84, 98.34% specificity, and 74.41% sensitivity. Conclusion: ML models can aid physicians in providing early HCC risk assessment in patients with MASLD. Further validation will translate to cost-effective, personalized care of at-risk patients.
What problem does this paper attempt to address?