An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting

Rui Cao,Qiao Wang
2024-08-09
Abstract:This research examines the use of Large Language Models (LLMs) in predicting time series, with a specific focus on the LLMTIME model. Despite the established effectiveness of LLMs in tasks such as text generation, language translation, and sentiment analysis, this study highlights the key challenges that large language models encounter in the context of time series prediction. We assess the performance of LLMTIME across multiple datasets and introduce classical almost periodic functions as time series to gauge its effectiveness. The empirical results indicate that while large language models can perform well in zero-shot forecasting for certain datasets, their predictive accuracy diminishes notably when confronted with diverse time series data and traditional signals. The primary finding of this study is that the predictive capacity of LLMTIME, similar to other LLMs, significantly deteriorates when dealing with time series data that contain both periodic and trend components, as well as when the signal comprises complex frequency components.
Machine Learning
What problem does this paper attempt to address?
The main problem this paper attempts to address is the evaluation of large language models (LLMs) in time series forecasting, particularly focusing on the LLMTIME model. Despite the excellent performance of large language models in tasks such as text generation, language translation, and sentiment analysis, they still face many challenges in time series forecasting. The researchers evaluated the performance of LLMTIME across multiple datasets and introduced classical almost periodic functions to test its effectiveness. Specifically, the paper focuses on the following aspects: 1. **Zero-shot prediction capability**: The researchers evaluated LLMTIME's prediction capability on different datasets without additional training. 2. **Handling complex time series data**: The researchers examined LLMTIME's performance in handling time series data that includes periodic and trend components. 3. **Comparison with traditional methods**: The researchers compared LLMTIME's prediction performance with traditional ARIMA models, especially in handling diverse time series data. The experimental results show that although large language models can perform effective zero-shot predictions on some datasets, their prediction accuracy significantly decreases when faced with diverse real-world time series data and traditional signals. Particularly, when the time series data includes periodic and trend components, as well as complex frequency components, LLMTIME's prediction capability is noticeably inferior to the ARIMA model. Overall, this study reveals the limitations of large language models in the field of time series forecasting and provides directions for further research.