LLM-ABBA: Understanding time series via symbolic approximation

Erin Carson,Xinye Chen,Cheng Kang
2024-12-06
Abstract:The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to \kc{avoid obvious drifting} during prediction tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive prediction capability compared to recent SOTA time series prediction results. We believe this framework can also seamlessly extend to other time series tasks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to effectively combine large - language models (LLMs) with time - series analysis in order to better understand and process time - series data. Specifically, the paper proposes a method named LLM - ABBA. This method symbolizes time series, converting numerical time - series signals into symbol sequences, thus enabling LLMs to understand time series and perform well in a variety of downstream time - series tasks, such as classification, regression, and prediction. The core challenges of the paper are as follows: 1. **Symbol Consistency**: Ensure that the same symbols in different time series contain the same information, which is crucial for maintaining the consistency of time - series features. 2. **Preservation of Semantic Information**: Ensure that the symbolic representation can preserve the semantic information of the time series, so that LLMs can learn the internal logic of the time series from these symbols. 3. **Control of Reconstruction Error**: When converting the symbol sequence back to the numerical time series, how to minimize the cumulative error and ensure the accuracy of prediction. To address these challenges, the paper introduces the ABBA (Adaptive Brownian Bridge - Based Symbol Aggregation) method, which is an efficient time - series symbolization method that can preserve its key features while compressing the time series. In addition, the paper also proposes a fixed - polygonal chain trick to reduce the obvious drift in prediction tasks and significantly alleviate the cumulative error problem caused by symbol misuse. In general, the main objective of the paper is to develop a tool that can effectively convert the internal patterns of time series into content that LLMs can recognize and can convert the generated content back to the time - series domain to assist in time - series analysis. Through this method, LLM - ABBA has achieved new best performance in multiple time - series tasks, especially reaching a new state - of - the - art level in time - series regression tasks.