Combining machine learning models through multiple data division methods for PM2.5 forecasting in Northern Xinjiang, China

Miaomiao Ren,Wei Sun,Shu Chen
DOI: https://doi.org/10.1007/s10661-021-09233-5
IF: 3.307
2021-07-07
Environmental Monitoring and Assessment
Abstract:<p class="a-plus-plus">In this study, daily average PM<sub class="a-plus-plus">2.5</sub> forecasting models were developed and applied in the Northern Xinjiang, China, through combining the back propagation artificial neural network (BPANN) and multiple linear regression (MLR) with another BPANN model. The meteorological (daily average precipitation, pressure, relative humidity, temperature, and wind speed, daily maximum wind speed and sunshine hours on the same day) and air pollutant data (daily PM<sub class="a-plus-plus">2.5</sub>, PM<sub class="a-plus-plus">10</sub>, SO<sub class="a-plus-plus">2</sub>, CO, NO<sub class="a-plus-plus">2</sub>, and O<sub class="a-plus-plus">3</sub> concentrations on the previous day) in January and August of each year from 2015 to 2019 were used as candidate inputs. The optimal member and combining models were evaluated through the leave-one-out cross-validation (LOOCV), fivefold cross-validation, and hold-out methods. Twelve member models with optimal or sub-optimal performance were further used to develop the combining models. The performances of the BPANN and MLR member models were different using three data division methods. The models were evaluated more comprehensively through the LOOCV. The performances of the combining models were generally better than the member models. For both member and combining models, the PM<sub class="a-plus-plus">2.5</sub> forecasting model performance in August was generally better than in January. The correlation coefficient (R) for the validation set of the optimal combination model was about 0.87 in January and 0.946 in August. These results showed that combining linear and nonlinear models through multiple data division methods would be an effective tool to forecast PM<sub class="a-plus-plus">2.5</sub> concentrations.</p>
environmental sciences
What problem does this paper attempt to address?