Replication of machine learning methods to predict treatment outcome with antidepressant medications in patients with major depressive disorder from STAR*D and CAN-BIND-1

John-Jose Nunez,Teyden T. Nguyen,Yihan Zhou,Bo Cao,Raymond T. Ng,Jun Chen,Benicio N. Frey,Roumen Milev,Daniel J. Muller,Susan Rotzinger,Claudio N. Soares,Rudolf Uher,Sidney H. Kennedy,Raymond W. Lam
DOI: https://doi.org/10.1371/journal.pone.0253023
IF: 3.7
2021-01-01
PLoS ONE
Abstract:Objectives Antidepressants are first-line treatments for major depressive disorder (MDD), but 40-60% of patients will not respond, hence, predicting response would be a major clinical advance. Machine learning algorithms hold promise to predict treatment outcomes based on clinical symptoms and episode features. We sought to independently replicate recent machine learning methodology predicting antidepressant outcomes using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset, and then externally validate these methods to train models using data from the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) dataset. Methods We replicated methodology from Nie et al (2018) using common algorithms based on linear regressions and decision trees to predict treatment-resistant depression (TRD, defined as failing to respond to 2 or more antidepressants) in the STAR*D dataset. We then trained and externally validated models using the clinical features found in both datasets to predict response (>= 50% reduction on the Quick Inventory for Depressive Symptomatology, Self-Rated [QIDS-SR]) and remission (endpoint QIDS-SR score <= 5) in the CAN-BIND-1 dataset. We evaluated additional models to investigate how different outcomes and features may affect prediction performance. Results Our replicated models predicted TRD in the STAR*D dataset with slightly better balanced accuracy than Nie et al (70%-73% versus 64%-71%, respectively). Prediction performance on our external methodology validation on the CAN-BIND-1 dataset varied depending on outcome; performance was worse for response (best balanced accuracy 65%) compared to remission (77%). Using the smaller set of features found in both datasets generally improved prediction performance when evaluated on the STAR*D dataset. Conclusion We successfully replicated prior work predicting antidepressant treatment outcomes using machine learning methods and clinical data. We found similar prediction performance using these methods on an external database, although prediction of remission was better than prediction of response. Future work is needed to improve prediction performance to be clinically useful.
What problem does this paper attempt to address?