Predict Future Sales using Ensembled Random Forests

Yuwei Zhang,Xin Wu,Chenyang Gu,Yueqi Xie
DOI: https://doi.org/10.48550/arXiv.1904.09031
2019-04-17
Abstract:This is a method report for the Kaggle data competition 'Predict future sales'. In this paper, we propose a rather simple approach to future sales predicting based on feature engineering, Random Forest Regressor and ensemble learning. Its performance turned out to exceed many of the conventional methods and get final score 0.88186, representing root mean squared error. As of this writing, our model ranked 5th on the leaderboard. (till 8.5.2018)
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **future sales forecasting**, specifically predicting the total sales of each product and store in the next month through machine - learning algorithms. This task falls within the scope of time - series forecasting, and the data set consists of daily sales data provided by 1C, one of the largest software companies in Russia. ### Main research background and objectives 1. **Competition background**: - The paper is based on the "Predict Future Sales" competition on the Kaggle platform. - The data set contains daily sales data, and contestants are required to predict the total sales of each product and store in the next month. 2. **Research objectives**: - Propose a simple method based on feature engineering, Random Forest Regressor, and ensemble learning to achieve high - precision prediction of future sales. - Surpass traditional prediction methods while maintaining the simplicity and interpretability of the model and achieve excellent results in the competition. ### Method overview - **Feature engineering**: Pre - process the original data to extract effective features, such as removing outliers, handling store/product category objects, and creating a monthly product - store pair matrix. - **Regression analysis**: Use the Random Forest Regressor as the main model and attempt ensemble learning to further improve the model performance. - **Model optimization**: Through parameter tuning and model fusion (ensemble learning), finally achieve a lower Root Mean Square Error (RMSE). ### Achievements - The model finally achieved an RMSE score of 0.88186 and ranked 5th on the Kaggle competition leaderboard. - It has been proven that classic models (such as Random Forest) can still perform well in certain tasks with the help of appropriate pre - processing and ensemble learning. ### Summary This paper shows how to achieve high - precision prediction of future sales through simple feature engineering combined with Random Forest Regressor and ensemble learning. Although there are more complex models (such as LSTM, XGBoost, LightGBM), the author found that the classic Random Forest model performs excellently in this specific application scenario and is easy to implement and train.