Comparison of multivariate linear regression and a machine learning algorithm developed for prediction of precision warfarin dosing in a Korean population

Van Lam Nguyen,Hoang Dat Nguyen,Yong-Soon Cho,Ho-Sook Kim,Il-Yong Han,Dae-Kyeong Kim,Sangzin Ahn,Jae-Gook Shin,Yong‐Soon Cho,Ho‐Sook Kim,Il‐Yong Han,Dae‐Kyeong Kim,Jae‐Gook Shin
DOI: https://doi.org/10.1111/jth.15318
2021-04-21
Journal of Thrombosis and Haemostasis
Abstract:<section class="article-section__content"><h3 class="article-section__sub-title section1"> Background</h3><p>Personalized warfarin dose is influenced by various factors including genetic and non‐genetic factors. Multiple linear regression (LR) is known as a conventional method to develop predictive models. Recently, machine learning approaches have been extensively implemented for warfarin dosing due to the hypothesis of non‐linear association between covariates and stable warfarin dose.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Objective</h3><p>To extend the multiple linear regression algorithm for personalized warfarin dosing in Korean population and compare with a machine learning‐based algorithm.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Method</h3><p>From this cohort study, we collected information of 650 patients taking warfarin who achieved steady state including demographic information, indications, comorbidities, co‐medications, habits, and genetics factors. The dataset was randomly split into training set (90%) and test set (10%). The LR and machine learning (gradient boosting machine; GBM) models were developed on training set and were evaluated on the test set.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Result</h3><p>The performance of LR and GBM models were comparable in terms of accuracy of ideal dose (75.38 % and 73.85%); correlation (0.77 and 0.73); mean absolute error (0.58 mg/day and 0.64 mg/day), root mean square error (0.82 mg/day and 0.9 mg/day), respectively. VKORC1 genotype, CYP2C9 genotype, age and weight were the highest contributors and could obtain 80% of maximum performance in both models.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Conclusion</h3><p>This study shows that our LR and GMB models are satisfactory to predict warfarin dose in our dataset. Both models showed similar performance and feature contribution characteristics. LR may be the appropriate model due to its simplicity and interpretability.</p></section>
peripheral vascular disease,hematology
What problem does this paper attempt to address?