AN IMPROVED RANDOM FOREST BOOST MULTI-LABEL TEXT CLASSIFICATION ALGORITHM

Shao Mengliang,Qi Deyu
DOI: https://doi.org/10.3969/j.issn.1000-386x.2022.11.034
2022-01-01
Abstract:The current boosting algorithm has the problem of high computational cost and long learning time, therefore we propose an improved RF-Boost algorithm(IRF-Boost). We sorted the training features, and filtered and used the smaller subsets of the top features in each boosting round. A feature was selected according to the weight to build a new weak hypothesis, and the size of the weak hypothesis search space was reduced from k to 1. Seven feature ranking methods(information gain, chi square, GSS coefficient, mutual information, advantage ratio, F1 score and accuracy) were tested and analyzed. The experimental results show that, mutual information is most suitable for RF-Boost, and the efficiency of IRF-Boost algorithm is better than that of RF-Boost and AdaBost.MH, which means IRF-Boost is a better choice to solve classification problems in practical applications and expert systems.
What problem does this paper attempt to address?