Decision Tree for Sequences

Zengyou He,Ziyao Wu,Guangyao Xu,Yan Liu,Quan Zou
DOI: https://doi.org/10.1109/tkde.2021.3075023
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Current decision trees such as C4.5 and CART are widely used in different fields due to their simplicity, accuracy and intuitive interpretation. Similar to other popular classifiers, these tree-based classification algorithms are developed for fixed-length vector data and suffer from intrinsic limitations in handling complex data such as sequences. To tackle the discrete sequence classification task, the dominant strategy is to adopt a two-step procedure: first transform the sequential dataset into a vector dataset and then apply existing tree-based classifiers on the new vector data. However, such methods are highly dependent on the feature generation procedure and some features that are critical to the tree construction may be missed. To alleviate these issues, we present a new tree-based sequence classification method, which is able to construct a concise decision tree from the feature space that is composed of all subsequences presented in the training sequences. Experimental results on fourteen real datasets show that our method can achieve better performance than those state-of-the-art sequence classification algorithms. The source codes of our method are available at: https://github.com/ZiyaoWu/SeqDT.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?