Particulate matter concentration from open-cut coal mines: A hybrid machine learning estimation

Chongchong Qi,Wei Zhou,Xiang Lu,Huaiting Luo,Binh Thai Pham,Zaher Mundher Yaseen
DOI: https://doi.org/10.1016/j.envpol.2020.114517
IF: 8.9
2020-08-01
Environmental Pollution
Abstract:<p>Particulate matter (PM) emission is one of the leading environmental pollution issues associated with the coal mining industry. Before any control techniques can be employed, however, an accurate prediction of PM concentration is desired. Towards this end, this work aimed to provide an accurate estimation of PM concentration using a hybrid machine-learning technique. The proposed predictive model was based on the hybridazation of random forest (RF) model particle swarm optimization (PSO) for estimating PM concentration. The main objective of hybridazing the PSO was to tune the hyper-parameters of the RF model. The hybrid method was applied to PM data collected from an open-cut coal mine in northern China, the Haerwusu Coal Mine. The inputs selected were wind direction, wind speed, temperature, humidity, noise level and PM concentration at 5 min before. The outputs selected were the current concentration of PM<sub>2.5</sub> (particles with an aerodynamic diameter smaller than 2.5 μm), PM<sub>10</sub> (particles with an aerodynamic diameter smaller than 10 μm) and total suspended particulate (TSP). A detailed procedure for the implementation of the RF_PSO was presented and the predictive performance was analyzed. The results show that the RF_PSO could estimate PM concentration with a high degree of accuracy. The Pearson correlation coefficients among the average estimated and measured PM data were 0.91, 0.84 and 0.86 for the PM<sub>2.5</sub>, PM<sub>10</sub> and TSP datasets, respectively. The relative importance analysis shows that the current PM concentration was mainly influenced by PM concentration at 5 min before, followed by humidity &gt; temperature ≈ noise level &gt; wind speed &gt; wind direction. This study presents an efficient and accurate way to estimate PM concentration, which is fundamental to the assessment of the atmospheric quality risks emanating from open-cut mining and the design of dust removal techniques.</p>
environmental sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to accurately predict the particulate matter (PM) concentration generated during the open - pit coal mining process. Specifically, the article proposes a hybrid machine - learning method based on the combination of random forest (RF) and particle swarm optimization (PSO) to improve the prediction accuracy of PM concentration. This helps to assess the air quality risk and provides a basis for designing effective dust control techniques. ### Problem Background Particulate matter (PM) emissions are one of the major environmental problems in the coal mining industry. In open - pit coal mines, various operations such as blasting and transportation will generate particulate matter of different particle sizes, especially small particulate matter with a diameter of less than 0.5 micrometers, which pose a serious threat to the environment and human health. Therefore, accurate prediction of PM concentration is crucial for assessing the air quality in mining areas. ### Limitations of Existing Methods Currently, research on PM concentration is mainly carried out through on - site measurement or computational modeling. Although on - site measurement has the advantages of high precision and real - time monitoring, it is costly and time - consuming; while computational modeling is limited by model assumptions, parameter selection, and verification processes. In addition, some computational simulations may take hours or even days to obtain convergent results. Therefore, a method that can efficiently and accurately predict PM concentration is needed to provide real - time environmental hazard warnings. ### Proposed Solution The article proposes a hybrid machine - learning method (RF_PSO) that combines random forest (RF) and particle swarm optimization (PSO) to predict the PM concentration in open - pit coal mines. RF, as an ensemble learning method, processes nonlinear relationships by combining multiple decision trees and is relatively insensitive to outliers in the data set. PSO is used to optimize the hyper - parameters of the RF model to obtain better prediction performance. ### Data Sources and Variables The study used PM concentration data collected from an open - pit coal mine (Haerwusu coal mine) in northern China. The input variables include wind direction, wind speed, temperature, humidity, noise level, and the PM concentration 5 minutes ago; the output variables are the current PM2.5, PM10, and total suspended particulate (TSP) concentrations. ### Model Evaluation The model performance was evaluated through statistical indicators such as Pearson correlation coefficient (R), root - mean - square error (RMSE), and mean absolute error (MAE). The results show that the RF_PSO model exhibits high accuracy in predicting PM concentration, with Pearson correlation coefficients of 0.91 (PM2.5), 0.84 (PM10), and 0.86 (TSP). Importance analysis shows that the current PM concentration is mainly affected by the PM concentration 5 minutes ago, followed by humidity, temperature, noise level, wind speed, and wind direction. ### Conclusion This study proposes an efficient hybrid machine - learning method that can accurately predict the PM concentration in open - pit coal mines, which is of great significance for assessing the environmental quality in mining areas and designing dust control measures. --- **Formula Summary: ** 1. **Normalization Formula**: \[ i_x=\frac{i_x'-\min(x')}{\max(x')-\min(x')} \] \[ i_y=\frac{i_y'-\min(y')}{\max(y')-\min(y')} \] 2. **Pearson Correlation Coefficient (R)**: \[ R = \frac{\sum_{i = 1}^{N}(i_y^*-\bar{i_y})(i_y-\bar{i_y})}{\sqrt{\sum_{i = 1}^{N}(i_y^*-\bar{i_y})^2}\sqrt{\sum_{i = 1}^{N}(i_y-\bar{i_y})^2}} \] 3. **Root - Mean - Square Error (RMSE)**: \[ RMSE=\sqrt{\frac{1}{N}\sum_{i = 1}^{N}(i_y^*-i_y)^2} \] 4. **Mean Absolute Error (MAE)**: \[ MAE=\frac{1}{N}\sum_{i = 1}^{N}|i_y^*-i_y| \]