A Modified Random Forest Based on Kappa Measure and Binary Artificial Bee Colony Algorithm

Chen Zhang,Xiaofeng Wang,Shengbing Chen,Hong Li,Xiaoxuan Wu,Xin Zhang
DOI: https://doi.org/10.1109/access.2021.3105796
IF: 3.9
2021-01-01
IEEE Access
Abstract:Random forest (RF) is an ensemble classifier method, all decision trees participate in voting, some low-quality decision trees will reduce the accuracy of random forest. To improve the accuracy of random forest, decision trees with larger degree of diversity and higher classification accuracy are selected for voting. In this paper, the RF based on Kappa measure and the improved binary artificial bee colony algorithm (IBABC) are proposed. Firstly, Kappa measure is used for pre-pruning, and the decision trees with larger degree of diversity are selected from the forest. Then, the crossover operator and leaping operator are applied in ABC, and the improved binary ABC is used for secondary pruning, and the decision trees with better performance are selected for voting. The proposed method (Kappa+IBABC) are tested on a quantity of UCI datasets. Computational results demonstrate that Kappa+IBABC improves the performance on most datasets with fewer decision trees. The Wilcoxon signed-rank test is used to verify the significant difference between the Kappa+IBABC method and other pruning methods. In addition, Chinese haze pollution is becoming more and more serious. This proposed method is used to predict haze weather and has achieved good results.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?
The paper aims to address the issue of some low-quality decision trees in Random Forest (RF) reducing the overall prediction accuracy. To improve the accuracy of Random Forest, the paper proposes a new method based on Kappa measurement and an Improved Binary Artificial Bee Colony (IBABC) algorithm. Specifically: 1. **Kappa Measurement Pre-pruning**: Pre-pruning using Kappa measurement to select decision trees with higher diversity, thereby reducing the number of low-quality decision trees in the forest and lowering the complexity of ensemble pruning. 2. **Improved Binary Artificial Bee Colony Algorithm Secondary Pruning**: After pre-pruning, introducing crossover and jump operators to improve the artificial bee colony algorithm, further selecting better-performing decision trees to participate in voting. 3. **Experimental Validation**: Tests were conducted on multiple UCI datasets and compared with traditional Random Forest and other improved methods. The experimental results show that the proposed method (Kappa+IBABC) achieves higher classification accuracy with fewer decision trees. Additionally, this method was used to predict smog weather in China and achieved good results. Through the above methods, the paper effectively enhances the generalization ability and prediction performance of Random Forest.