An efficient visual assessment of cluster tendency tool for large-scale time series data sets

Timothy B. Iredale,S. Erfani,C. Leckie
DOI: https://doi.org/10.1109/FUZZ-IEEE.2017.8015587
2017-07-01
Abstract:Data visualization has always been a vital tool to explore and understand underlying data structures and patterns. However, emerging technologies such as the Internet of Things (IoT) have enabled the collection of very large amounts of data over time. The sheer quantity of data available challenges existing time series visualisation methods. In this paper we present an introductory analysis of time series clustering with a focus on a novel shape-based measure of similarity, which is invariant under uniform time shift and uniform amplitude scaling. Based on this measure we develop a Visual Assessment of cluster Tendency (VAT) algorithm to assess large time series data sets and demonstrate its advantages in terms of complexity and propensity for implementation in a distributed computing environment. This algorithm is implemented as a cloud application using Spark where the run-time of the high complexity dissimilarity matrix calculations are reduced by up to 7.0 times in a 16 core computing cluster with even higher speed-up factors expected for larger computing clusters.
What problem does this paper attempt to address?