Never a Dull Moment: Distributional Properties as a Baseline for Time-Series Classification

Trent Henderson,Annie G. Bryant,Ben D. Fulcher
2023-03-31
Abstract:The variety of complex algorithmic approaches for tackling time-series classification problems has grown considerably over the past decades, including the development of sophisticated but challenging-to-interpret deep-learning-based methods. But without comparison to simpler methods it can be difficult to determine when such complexity is required to obtain strong performance on a given problem. Here we evaluate the performance of an extremely simple classification approach -- a linear classifier in the space of two simple features that ignore the sequential ordering of the data: the mean and standard deviation of time-series values. Across a large repository of 128 univariate time-series classification problems, this simple distributional moment-based approach outperformed chance on 69 problems, and reached 100% accuracy on two problems. With a neuroimaging time-series case study, we find that a simple linear model based on the mean and standard deviation performs better at classifying individuals with schizophrenia than a model that additionally includes features of the time-series dynamics. Comparing the performance of simple distributional features of a time series provides important context for interpreting the performance of complex time-series classification models, which may not always be required to obtain high accuracy.
Methodology,Machine Learning,Applications
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper explores the issue of time series classification and attempts to address the following problems: 1. **Effectiveness of Simple Methods**: The study finds that in many cases, simple linear classifiers (based on the mean and standard deviation of the time series) perform quite well in time series classification tasks, even outperforming complex deep learning methods. This suggests that in some cases, complex methods may not be necessary. 2. **Importance of Benchmarking**: The paper emphasizes the importance of using simple benchmark methods when evaluating time series classification algorithms to better understand whether the performance improvements brought by complex models are truly necessary. By comparing with simple distribution features, the actual contribution of complex models can be better assessed. 3. **Performance on Specific Datasets**: The paper specifically analyzes a time series classification task in neuroimaging, namely distinguishing between schizophrenia patients and healthy controls using resting-state functional magnetic resonance imaging (rs-fMRI) data. The results show that a simple model using only the mean and standard deviation as features performs excellently, even better than complex models that include more dynamic features. 4. **Importance of Normalization**: The paper also discusses the impact of time series normalization (such as z-score transformation) on classification results. If all time series are normalized, classifiers based on the mean and standard deviation will not work. Therefore, normalization is crucial for fair comparison of different methods. Through these studies, the paper emphasizes the need to carefully consider the effectiveness of simple distribution features and the interpretability of models when developing and interpreting time series classification models, as well as how to choose appropriate features to avoid overfitting issues.