LSO-080 Machine-learning Approach on Lupus Low Disease Activity Prediction
Nick Faelnar,Michael Tee,Cherica Tee,Jaime Caro,Geoffrey Solano,Rangi Kandane-Rathnayake,Angelene Therese Magbitang-Santiago,Evelyn Salido,Vera Golder,Worawit Louthrenoo,Yi-Hsing Chen,Jiacai Cho,Aisha Lateef,Laniyati Hamijoyo,Shue-Fen Luo,Yeong-Jian J. Wu,Sandra Navarra,Leonid Zamora,Zhanguo Li,Sargunan Sockalingam,Yasuhiro Katsumata,Masayoshi Harigai,Yanjie Hao,Zhuoli Zhang,B. M. D. B. Basnayake,Madelynn Chann,Jun Kikuchi,Tsutomu Takeuchi,Sang-Cheol Bae,Shereen Oon,Sean O'Neill,Fiona Goldblatt,Kristine Ng,Annie Law,Nicola Tugnet,Sunil Kumar,Naoaki Ohkubo,Yoshiya Tanaka,Chak Sing Lau,Mandana Nikpour,Alberta Hoi,Eric Morand
DOI: https://doi.org/10.1136/lupus-2023-kcr.122
IF: 4.687
2023-01-01
Lupus Science & Medicine
Abstract:Background The development of lupus low disease activity state (LLDAS) as a treat-to-target endpoint for SLE patients has been validated. Its attainment has been associated with improved outcomes. This study aims to show whether a machine learning model can yield good results in predicting whether a patient will achieve LLDAS on their succeeding assessment. Methods A total of 42,355 records of patients were retrieved from the APLC longitudinal study database. Three machine learning models - XGBoost, Random Forest, and Naive Bayes - were tested for their predictive power. Eighty percent of the data was used to train the models while thirty percent was used for validation. The data were normalized and all models were subjected to 10-fold cross-validation to prevent overfitting. Additionally, we compared the top ten most significant features of each model. Results Various metrics were used to measure the model's predictive power. The results of our study showed that the Random Forest model scored the highest for specificity, PPV, and accuracy with 0.8450, 0.8182, and 0.8338, respectively. The XGBoost model topped the NPV metric with 0.8559 while the Naive Bayes model got the highest score for sensitivity with 0.8986. It is good to note that the score difference of Random Forest with the top sensitivity and NPV scores were only 0.0629 and 0.0085, respectively. For the significant features, only two features were present on all three models, namely the current LLDAS and proteinuria level. Three additional features were important for two models-whether the patient is taking prednisolone; time adjusted mean (TAM) SLEDAI score; and SLEDAI score. Results Various metrics were used to measure the model's predictive power. The results of our study showed that the Random Forest model scored the highest for specificity, PPV, and accuracy with 0.8450, 0.8182, and 0.8338, respectively. The XGBoost model topped the NPV metric with 0.8559 while the Naive Bayes model got the highest score for sensitivity with 0.8986. It is good to note that the score difference of Random Forest with the top sensitivity and NPV scores were only 0.0629 and 0.0085, respectively. For the significant features, only two features were present on all three models, namely the current LLDAS and proteinuria level. Three additional features were important for two models-whether the patient is taking prednisolone; time adjusted mean (TAM) SLEDAI score; and SLEDAI score. Conclusions The study showed and compared various machine learning models on their predictive power in determining whether a patient will achieve LLDAS on their next visit. The results determined that the current LLDAS, proteinuria levels, SLEDAI score (and TAM SLEDAI),