Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series

Ioannis Nasios,Konstantinos Vogklis
DOI: https://doi.org/10.1016/j.ijforecast.2022.01.001
2023-10-19
Abstract:In this paper we tackle the problem of point and probabilistic forecasting by describing a blending methodology of machine learning models that belong to gradient boosted trees and neural networks families. These principles were successfully applied in the recent M5 Competition on both Accuracy and Uncertainty tracks. The keypoints of our methodology are: a) transform the task to regression on sales for a single day b) information rich feature engineering c) create a diverse set of state-of-the-art machine learning models and d) carefully construct validation sets for model tuning. We argue that the diversity of the machine learning models along with the careful selection of validation examples, where the most important ingredients for the effectiveness of our approach. Although forecasting data had an inherent hierarchy structure (12 levels), none of our proposed solutions exploited that hierarchical scheme. Using the proposed methodology, our team was ranked within the gold medal range in both Accuracy and the Uncertainty track. Inference code along with already trained models are available at <a class="link-external link-https" href="https://github.com/IoannisNasios/M5_Uncertainty_3rd_place" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence,Data Analysis, Statistics and Probability,Statistical Finance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the accuracy problem of **point forecasting and probability forecasting in hierarchical time series**, specifically in the prediction of daily sales volume of retail products. The author achieves this goal by fusing two machine - learning models, Gradient Boosted Trees and Neural Networks. ### Main problems 1. **Point Forecasting**: Predict the daily sales volume of each product/store combination within the next 28 days. 2. **Probabilistic Forecasting**: Estimate the probability distribution of the sales volume of each product/store combination within the next 28 days, including the median and four confidence intervals (50%, 67%, 95%, 99%). ### Research background - **M5 Competition**: This research is based on the dataset of the M5 Competition, which contains 42,840 hierarchical sales time series from Walmart, covering the sales data of multiple products, departments and stores in three US states (California, Texas and Wisconsin). - **Data characteristics**: The data has a hierarchical structure (a total of 12 levels), but the author does not explicitly use this hierarchical structure for prediction. ### Solutions 1. **Feature engineering**: - **Category ID features**: Encoding of discrete variables such as stores and products. - **Price - related features**: Such as the maximum, minimum and average values of historical prices. - **Calendar - related features**: Such as special events and holidays. - **Lag features**: Such as the sales volume in the past few days and its rolling statistics. 2. **Model selection and fusion**: - **LightGBM model**: Used to handle intermittent sales data, using the Tweedie loss function. - **Keras and FastAI neural network models**: Used to capture complex non - linear relationships. - **Model fusion**: Fuse the prediction results of different models by the geometric average method to improve the prediction accuracy. 3. **Validation and optimization**: - **Cross - validation**: Three different training/validation splitting strategies are designed to ensure the generalization ability of the model. - **Hyperparameter optimization**: Select the best model architecture and parameters through multiple experiments. 4. **Post - processing**: - **Exponential smoothing**: Smooth the final prediction results to further improve the prediction accuracy. ### Results - In the two tracks of accuracy and uncertainty in the M5 Competition, the author's team won the gold medal rankings respectively. - By fusing multiple models and carefully designing the validation set, the author successfully avoids the need for external adjustment and achieves excellent results. ### Summary This paper shows how to achieve significant performance improvement in hierarchical time - series prediction tasks by fusing Gradient Boosted Trees and Neural Networks models. The author emphasizes the importance of model diversity, feature engineering and validation set selection, and points out the crucial role of these factors for the final prediction results.