A comparative study of statistical and machine learning models on near-real-time daily emissions prediction

Xiangqian Li
DOI: https://doi.org/10.48550/arXiv.2302.01152
2023-02-02
Abstract:The rapid ascent in carbon dioxide emissions is a major cause of global warming and climate change, which pose a huge threat to human survival and impose far-reaching influence on the global ecosystem. Therefore, it is very necessary to effectively control carbon dioxide emissions by accurately predicting and analyzing the change trend timely, so as to provide a reference for carbon dioxide emissions mitigation measures. This paper is aiming to select a suitable model to predict the near-real-time daily emissions based on univariate daily time-series data from January 1st, 2020 to September 30st, 2022 of all sectors (Power, Industry, Ground Transport, Residential, Domestic Aviation, International Aviation) in China. We proposed six prediction models, which including three statistical models: Grey prediction (GM(1,1)), autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average with exogenous factors (SARIMAX); three machine learning models: artificial neural network (ANN), random forest (RF) and long short term memory (LSTM). To evaluate the performance of these models, five criteria: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Coefficient of Determination () are imported and discussed in detail. In the results, three machine learning models perform better than that three statistical models, in which LSTM model performs the best on five criteria values for daily emissions prediction with the 3.5179e-04 MSE value, 0.0187 RMSE value, 0.0140 MAE value, 14.8291% MAPE value and 0.9844 value.
Artificial Intelligence,Machine Learning,Physics and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to select a model most suitable for predicting the daily carbon dioxide emissions of various industries in China (electric power, industry, ground transportation, residents, domestic aviation, international aviation) by comparing the performance of statistical models and machine - learning models in near - real - time daily carbon dioxide emission prediction. The paper uses the daily time - series data from January 1, 2020 to September 30, 2022 and proposes six prediction models, including three statistical models (Grey Prediction Model GM(1,1), Autoregressive Integrated Moving Average Model ARIMA, Seasonal Autoregressive Integrated Moving Average Model with Exogenous Factors SARIMAX) and three machine - learning models (Artificial Neural Network ANN, Random Forest RF, Long - Short - Term Memory Network LSTM). The performance of these models is evaluated through five evaluation criteria (Mean Squared Error MSE, Root Mean Squared Error RMSE, Mean Absolute Error MAE, Mean Absolute Percentage Error MAPE, Coefficient of Determination R²), and finally it is determined that the LSTM model performs best on these five evaluation indicators and is especially suitable for near - real - time daily carbon dioxide emission prediction based on long - time - series data. Specifically, the paper aims to: 1. **Improve prediction accuracy**: By comparing the prediction performance of different models, find a model that can provide more accurate predictions to support the formulation of carbon emission mitigation measures. 2. **Fill research gaps**: At present, most of the research on carbon dioxide emissions focuses on annual emission prediction, while this paper focuses on short - cycle daily emission prediction, which is helpful for timely policy response adjustment. 3. **Provide decision - making support**: Through accurate prediction results, provide references for policy - makers to control and reduce carbon dioxide emissions more effectively. The innovation point of this paper lies in using an extended near - real - time carbon dioxide emission data set with a daily frequency to select the most appropriate daily prediction model, which not only improves the prediction accuracy but also provides an opportunity for decision - makers to adjust policies in a timely manner.