Quantized symbolic time series approximation

Erin Carson,Xinye Chen,Cheng Kang
2024-11-20
Abstract:Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve essential shape information of time series and improve downstream applications, e.g., neural network inference regarding prediction and anomaly detection in time series. Motivated by the emergence of high-performance hardware which enables efficient computation for low bit-width representations, we present a new quantization-based ABBA symbolic approximation technique, QABBA, which exhibits improved storage efficiency while retaining the original speed and accuracy of symbolic reconstruction. We prove an upper bound for the error arising from quantization and discuss how the number of bits should be chosen to balance this with other errors. An application of QABBA with large language models (LLMs) for time series regression is also presented, and its utility is investigated. By representing the symbolic chain of patterns on time series, QABBA not only avoids the training of embedding from scratch, but also achieves a new state-of-the-art on Monash regression dataset. The symbolic approximation to the time series offers a more efficient way to fine-tune LLMs on the time series regression task which contains various application domains. We further present a set of extensive experiments performed across various well-established datasets to demonstrate the advantages of the QABBA method for symbolic approximation.
Machine Learning,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the storage efficiency while maintaining the speed and accuracy of the symbolic representation of time series. Specifically, the author proposes a new quantization technique QABBA (Quantized ABBA) to reduce the storage requirements of time - series data while minimizing the additional error introduced by quantization. ### Problem Background Time - series data widely exists in many scientific and engineering fields, such as signal processing, bioinformatics, and astronomy. Although traditional symbolic time - series representation methods (such as SAX) are effective, there is still room for improvement in terms of storage efficiency and dimension reduction. The recent ABBA (Adaptive Brownian Bridge - based Aggregation) method has demonstrated its superiority in retaining the shape information of time series, but there is still the possibility of further optimization. ### Research Motivation With the development of high - performance hardware, low - bit - width representations (such as integer operations) can significantly reduce storage and computational costs without sacrificing speed and precision. Therefore, inspired by the quantization techniques of deep - learning models, the author applies the quantization technique to the ABBA method and proposes QABBA. ### Main Contributions 1. **Propose QABBA**: By replacing the original floating - point representation with low - bit - width integer types, QABBA can significantly improve storage efficiency while maintaining the speed and accuracy of the ABBA method. 2. **Quantization Error Analysis**: The author analyzes the additional approximation error introduced by quantization and proves the upper bound of the quantization error through theoretical derivation, providing theoretical support for the application of the quantization technique. 3. **Applied Research**: QABBA is combined with large - language models for time - series regression tasks, demonstrating its potential in various application scenarios. 4. **Experimental Verification**: Through experiments on multiple commonly - used datasets, the quantization error and reconstruction quality of QABBA under different bit lengths are verified, proving its superiority. ### Summary of Mathematical Formulas - **Quantization Mapping**: \[ \tilde{x}=Q(x)=\left\lfloor\frac{x - z}{s}\right\rceil \] where \(s = \frac{\eta-\zeta}{e_{\eta}-e_{\zeta}}\), and \(\left\lfloor\cdot\right\rceil\) represents rounding to the nearest integer. - **De - quantization Mapping**: \[ y = Q^{-1}(\tilde{x})=s(\tilde{x}+z) \] - **Upper Bound of Quantization Error**: \[ \|\tilde{C}-C\|_F\leq\frac{\eta-\zeta}{2^{\omega + 1}-2}\sqrt{2k} \] - **Upper Bound of SSE after Quantization**: \[ dSSE\leq SSE+\frac{2N(\eta-\zeta)^2}{(2^{\omega + 1}-2)^2} \] Through these improvements, QABBA not only improves the storage efficiency of the symbolic representation of time series, but also maintains the original speed and accuracy, and is suitable for various downstream tasks, such as prediction and anomaly detection.