Comparison and analysis of applications of ID3, CART decision tree models and neural network model in medical diagnosis and prognosis evaluation

Zeye Liu,Xiangbin Pan,,
DOI: https://doi.org/10.52768/2766-7820/1101
2021-05-04
Journal of Clinical Images and Medical Case Reports
Abstract:Objective: To analyze the performance of each algorithm model under different processing conditions such as data preprocessing (standardization, normalization and regularization), balancing and shuffling based on the data attributes of three common research types in clinical studies as the research examples. To compare and analyze advantages and disadvantages of the decision tree model and the neural network model in clinical studies as well as their scope of application. Methods: Python was used to construct ID3 and CART decision tree models. Three typical clinical research data sets were downloaded from UCI and used to perform data preprocessing, balancing, and shuffling on the models. The model evaluation indexes included time complexity, accuracy, precision, recall and F1-Score. As for visualization, the model results, confusion matrix and ROC curve were drawn. The importance rankings of different data set attributes on the model results were also analyzed. In addition, one typical data set was selected to conduct the comparative analysis by using the neural network model. SPSS was used to perform the significance analysis of different data processing schemes. The SPSS platform was used to conduct the statistical test of the results. Results: (1) There were a total of 96 decision trees based on 2 decision tree algorithms, 3 data sets, 4 types of data preprocessing, 2 balanced choices and 2 shuffling choices. (2) The AUC value of the Thoracic Surgery Data Set significantly increased after balancing with a maximum increase of 0.3, which was statistically significant (P <0.01). (3) The AUC value of the Breast Cancer Wisconsin (Diagnostic) Data Set generally increased after normalization, which decreased after regularization. The maximum decrease was 0.6 without statistical significance (P = 0.3). (4) The AUC value of the Statlog (Heart) Data Set increased after regularization but it was not statistically significant. The maximum increase was 0.03. (5) Data balancing and shuffling can increase the AUC value. (6) The performance of the neural network model was between the best and worst performance of the decision tree model.
What problem does this paper attempt to address?