Tree-based Methods for Clustering Time Series Using Domain-Relevant Attributes

Mahsa Ashouri,Galit Shmueli,Chor-Yiu Sin
DOI: https://doi.org/10.1080/2573234x.2019.1645574
2019-01-01
Journal of Business Analytics
Abstract:We propose two methods for time-series clustering that capture temporal information (trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the challenge of using common time-series models in MOB by instead utilising least squares regression. We propose two methods. The single-step method clusters series using trend, seasonality, lags and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality and cross-sectional attributes, and then clusters the residuals by autocorrelation and domain-relevant attributes. Both methods produce clusters interpretable by domain experts. We illustrate our approach by considering one-step-ahead forecasting and compare to autoregressive integrated moving average (ARIMA) models for forecasting many Wikipedia pageviews time series. The tree-based approach produces forecasts on par with ARIMA, yet is significantly faster and more efficient, thereby suitable for large collections of time-series. The simple parametric forecasting models allow for interpretable time-series clusters.
What problem does this paper attempt to address?