Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels

Gonzalo Uribarri,Federico Barone,Alessio Ansuini,Erik Fransén
2024-06-24
Abstract:Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{<a class="link-external link-https" href="https://github.com/gon-uri/detach_rocket" rel="external noopener nofollow">this https URL</a>}.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issues of computational inefficiency and insufficient model generalization caused by feature redundancy in Time Series Classification (TSC). Specifically, although existing machine learning models such as Recurrent Neural Networks (RNNs) and InceptionTime have achieved success in various applications, they face scalability issues on large-scale datasets due to their high computational demands. To tackle these problems, the paper introduces a method called Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET models (such as ROCKET, MiniRocket, and MultiRocket). SFD evaluates the importance of features by estimating model coefficients and can handle large feature sets without complex hyperparameter tuning. Experimental results show that SFD can improve test accuracy while retaining only 10% of the original features, significantly reducing model size without sacrificing accuracy. This method not only enhances computational efficiency but also improves model interpretability. Additionally, the paper introduces an end-to-end process to determine the optimal balance between the number of features and model accuracy to further optimize model performance.