Risk Factors for Gout in Taiwan Biobank: A Machine Learning Approach
Yu-Ruey Liu,Oswald Ndi Nfor,Ji-Han Zhong,Chun-Yuan Lin,Yung-Po Liaw
DOI: https://doi.org/10.2147/jir.s490821
IF: 4.5
2024-11-28
Journal of Inflammation Research
Abstract:Yu-Ruey Liu, 1– 3 Oswald Ndi Nfor, 4 Ji-Han Zhong, 4 Chun-Yuan Lin, 3 Yung-Po Liaw 4– 6 1 College of Information and Electrical Engineering, Asia University, Taichung, 413, Taiwan; 2 Department of Emergency Medicine, Cheng Ching General Hospital, Taichung, Taiwan; 3 Department of Computer Science and Information Engineering, Asia University, Taichung, 413, Taiwan; 4 Department of Public Health and Institute of Public Health, Chung Shan Medical University, Taichung, Taiwan; 5 Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan; 6 Department of Medical Imaging, Chung Shan Medical University Hospital, Taichung, Taiwan Correspondence: Yung-Po Liaw, Department of Public Health and Institute of Public Health, Chung Shan Medical University, No. 110, Sec. 1 Jianguo N. Road, Taichung, 40201, Taiwan, Tel +886-4-36097501, Email Chun-Yuan Lin, Department of Computer Science and Information Engineering, Asia University, No. 500, Lioufeng Road, Wufeng, Taichung, 413, Taiwan, Tel +886-4-2332-3456 &num 1814, Email Purpose: We assessed the risk of gout in the Taiwan Biobank population by applying various machine learning algorithms. The study aimed to identify crucial risk factors and evaluate the performance of different models in gout prediction. Patients and Methods: This study analyzed data from 88,210 individuals in the Taiwan Biobank, identifying 19,338 cases of gout and 68,872 controls. After data cleaning and propensity score matching for gender and age, the final analytical sample comprised 38,676 individuals (19,338 gout cases and 19,338 controls). Five machine learning models were used: Bayesian Network (BN), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and Neural Network (NN). The predictive performance was evaluated using a split dataset (80% training set and 20% test set). Results: Variable importance analysis was performed to identify key variables, with uric acid and gender emerging as the most influential risk factors. Descriptive data highlighted significant differences between the control group and gout patients, with a higher prevalence of gout in men (51.36% vs 48.64%). Both the RF and GB demonstrated high performance across multiple metrics, with RF consistently achieving a high area under the curve (AUC) of 0.986 to 0.987, alongside excellent sensitivity (0.945– 0.947) and specificity (0.998– 0.999). GB also performed robustly, with AUC values around 0.987– 0.988 and maintaining high sensitivity (0.944– 0.950) and specificity (0.995– 0.999) across different model variations. The F1 scores for both models (GB and RF) indicate strong predictive capabilities, with values around 0.971– 0.972. Conclusion: The RF and GB demonstrated exceptional accuracy in predicting gout status, particularly when incorporating genetic data alongside clinical factors. These findings underscore the potential for integrating machine learning models with genetic information to enhance gout prediction accuracy in clinical practice. Keywords: risk prediction, gout, machine learning, artificial intelligence Gout is a complex inflammatory condition primarily caused by hyperuricemia leading to monosodium urate (MSU) crystal deposition in joints and other tissues. Its clinical presentation includes acute painful flares and chronic complications. 1 The disease etiology is multifactorial, involving genetic predispositions, lifestyle factors such as diet and alcohol consumption, and certain medical conditions that affect uric acid metabolism. 2,3 Gout is associated with various comorbidities, including cardiovascular diseases and renal impairment, which can complicate its management and exacerbate patient morbidity. 1,4 Gout is a prevalent and debilitating rheumatic disease with an increasing global incidence, especially in Pacific and developed countries. 5 The Taiwanese population, like many others, faces the growing burden of gout, with a reported prevalence of approximately 6.24%. 6 The high recurrence rate linked to this condition leads to diminished health-related quality of life 7,8 and heightened financial strain, particularly for individuals unable to manage it effectively. 9 On a global scale, there were 10,016,336 reported cases in 2023, which is anticipated to rise to approximately 12,082,807 by 2035. 7 Understanding the risk factors and developing effective prediction models for gout is essential for proactive management and prevention. In this context, the use of data-driven methods, particularly machine learning (ML), has gained prominence as a powerful tool for disease risk assessment. 10 -Abstract Truncated-
immunology