Application Research of Imbalanced Data Processing Methods on Prediction of Adverse Reactions of Traditional Chinese Medicine

吴东苑,杨伟,唐进法,李学林,王晓艳,易丹辉
DOI: https://doi.org/10.11842/wst.2017.09.008
2017-01-01
Abstract:In view of the imbalance of the adverse reaction data of traditional Chinese medicine (TCM),this paper explored and applied the processing method of imbalanced data to predict adverse reactions of TCM.This paper took patients who used Dan-Hong (DH) injection as the research object,excavated centralized monitoring data from 37 hospitals,and predicted adverse reactions from patients who used DH injection.This paper combined four data-level approaches,including non-processing,random undersampling,random oversampling and SMOTE algorithm,with four algorithm-level approaches,including decision tree,random forest,AdaBoost and Gradient Boosting,to process the imbalanced data,and then to compare their prediction performance.Finally we found that two algorithms,combining random undersampling with AdaBoost,and combining random undersampling with Gradient Boosting,had better prediction performance than other algorithms.Their recall and G-mean both reached 80%;and AUC was more than 0.86.It was concluded that the imbalanced data processing methods were preliminary explored.This method is applicable to the prediction of TCM adverse reactions in combination with practical experiences.It can accurately predict whether adverse reactions occurred in patients who used DH injection.It can play a certain warning role in clinical practice.At the same time,according to the importance of the output variable ranking,we can minimize the occurrence of adverse reactions after treatment.It provided some scientific references for the safety reassessment of DH injection.
What problem does this paper attempt to address?