Random Subsequence Forests

Zengyou He,Jiaqi Wang,Mudi Jiang,Lianyu Hu,Quan Zou
DOI: https://doi.org/10.1016/j.ins.2024.120478
IF: 8.1
2024-03-21
Information Sciences
Abstract:The random forest classifier is widely used in different fields due to its accuracy and robustness. Since its invention, the random forest algorithm is naturally developed for multi-dimensional vectorial data since features can be directly sampled during the decision tree construction procedure. In the context of discrete sequence classification, an explicit feature set is not readily available and we need to employ a feature extraction algorithm before building the random forest. However, such a predefined feature subset may limit the diversity of decision trees since the set of candidate features is composed of all subsequences. As a result, the predictive accuracy of constructed random forest classifier may be reduced. To address this, we propose a new algorithm that is able to directly build a random forest by choosing features from the set of all subsequences adaptively. To improve the running efficiency of our algorithm, the count-suffix tree is utilized to facilitate the fast frequency counting of subsequences so as to accelerate the generation of each randomized decision tree. The experimental results on 15 real datasets show that our method can outperform those state-of-the-art classification algorithms in terms of the predictive accuracy. The source code of our method can be found at: https://github.com/JiaqiWang-dlut/RSForest .
computer science, information systems
What problem does this paper attempt to address?