LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data

Hanyu Zhang,Chuck Arvin,Dmitry Efimov,Michael W. Mahoney,Dominique Perrault-Joncas,Shankar Ramasubramanian,Andrew Gordon Wilson,Malcolm Wolff
2024-12-04
Abstract:Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for clearly relevant products. To address this shortcoming, this paper introduces a novel forecast post-processor -- which we call LLMForecaster -- that fine-tunes large language models (LLMs) to incorporate unstructured semantic and contextual information and historical data to improve the forecasts from an existing demand forecasting pipeline. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that modern time - series prediction models fail to fully utilize the rich unstructured information about the time series itself, resulting in obvious model failures when predicting certain seasonal events (such as holidays). Specifically, existing prediction models may be unable to be aware of the details of specific products, and thus cannot accurately predict the surge in customer demand before and after major exogenous events (such as holidays), especially in the retail environment. ### Core of the problem 1. **Limitations of existing models**: - Existing time - series prediction models can usually only effectively integrate numerical or categorical exogenous features, while ignoring a large amount of descriptive and contextual information in the form of unstructured text, such as product descriptions, customer reviews, etc. - This unstructured text information is difficult to be directly used for model training, so it is ignored or simply processed. 2. **Challenges in seasonal demand prediction**: - Many new products have only limited historical sales data, and these data may contain noise (such as sudden sales peaks or out - of - stock situations), making prediction based on historical data difficult. - The demand surges for certain seasonal events (such as Mother's Day, Easter) are limited to a few products, the overall customer traffic is not significant, and the product discount frequency is low, which increases the difficulty of prediction. 3. **Need for human intervention**: - Currently, in order to make up for the deficiencies of model prediction, it is necessary to rely on human analysts to identify products related to upcoming events based on experience and knowledge, and adjust the prediction accordingly. ### Solution To solve these problems, the paper proposes a new post - prediction - processing method - LLMForecaster. This method fine - tunes large language models (LLMs) to combine unstructured semantic and contextual information with historical data, thereby improving the output of the existing demand prediction pipeline. Specifically: - **Introducing LLMForecaster**: Use the fine - tuned LLMs to combine unstructured text information (such as product descriptions) and numerical features (such as price) to generate an adjustment factor for correcting the prediction results of existing models. - **Improving prediction accuracy**: In this way, LLMForecaster can systematically improve the performance of existing models in predicting holiday - related demand surges, especially in the retail environment, and improve the accuracy of product - level demand prediction. ### Experimental verification The paper verifies the effectiveness of LLMForecaster through a series of experiments, especially showing significant improvement in product demand prediction during multiple important holidays (such as Halloween, Easter, Mother's Day, Father's Day, Valentine's Day). The experimental results show that LLMForecaster can not only more accurately capture these seasonal demand surges, but also significantly reduce the risk of inventory shortages and improve customer satisfaction. ### Summary In conclusion, this paper aims to improve time - series prediction models by introducing LLMForecaster and using unstructured text information, especially making significant progress in dealing with seasonal demand surges. This method provides a better tool for the retail industry to manage seasonal fluctuations and optimize operational efficiency.