Identification of mine water source based on TPE-LightGBM

Man Wang,Jianguo Zhang,Han Li,Bo Zhang,Zhenwei Yang
DOI: https://doi.org/10.1038/s41598-024-62413-4
IF: 4.6
2024-06-01
Scientific Reports
Abstract:Mine water inrush is a serious threat to mine safety production. It is very important to identify water inrush source types quickly to prevent and control water damage. In this study, the aqueous chemical components Na + + K + , Ca 2+ , Mg 2+ , Cl − , SO 4 2− and HCO 3− of different aquifers in Pingdingshan coalfield were selected as the characteristic values, and the Surface water, Quaternary pore water, Carboniferous limestone karst water, Permian sandstone water, and Cambrian limestone karst water were used as the labels. An intelligent water source discrimination model is proposed by combining data mining, classification models, and reinforcement learning. As outlier data in the samples may interfere with the model recognition ability, the data distribution range was analyzed using box plots, and 20 groups of abnormal samples were excluded. The processed water chemistry data were divided into 80% learning samples and 20% test samples, and the learning samples were fed into a light gradient boosting machine (LightGBM) for training. The tree-structured parson estimator (TPE) obtains the optimal values of the main parameters of LightGBM in a very short time. Substituting the hyperparameters back into the model yields a 13.9% improvement in the accuracy of the model, proving the effectiveness of the TPE algorithm. To further validate the performance of the model, TPE-LightGBM is compared and analyzed with a Random Search-Multi Layer Perceptron Machine (RS-MLP) and Genetic Algorithm-Extreme Gradient Boosting Tree (GA-SVM). The accuracy of TPE-LightGBM, RS-MLP, and GA-SVM is 0.931, 0.759, 0.724 in that order, and the generalization error RMSE is 0.415, 1.05, and 1.313 in that order. The results show that TPE-LightGBM is more advantageous in water source identification and is more resistant to overfitting. By calculating and comparing the information gain of each variable, the contribution of Ca 2+ is the highest, so it is necessary to pay attention to the change in Ca 2+ concentration. TPE-LightGBM's high accuracy and generalization ability have a good prospect for the identification of sudden water source types.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper primarily addresses the issue of rapid and accurate identification of water sources in coal mine water hazards. Specifically, the paper proposes a Light Gradient Boosting Machine (LightGBM) model optimized by Tree-structured Parzen Estimator (TPE) to identify the sources of mine water inflow, considering the complex hydrogeological conditions of the Pingdingshan coalfield. By selecting chemical components from different aquifers (such as Na+ + K+, Ca2+, Mg2+, Cl−, SO42−, and HCO3−) as feature values and using these aquifer types as labels, an intelligent water source discrimination model was constructed. To improve the model's performance, abnormal samples were excluded, and the data was standardized. Additionally, the TPE algorithm was used to optimize the key parameters of LightGBM to reduce the risk of overfitting and enhance the model's accuracy. The optimized TPE-LightGBM model achieved an accuracy of 93.1% on the test samples and demonstrated better generalization ability compared to other machine learning methods (such as Random Search-Multilayer Perceptron (RS-MLP) and Genetic Algorithm-Support Vector Machine (GA-SVM)). In summary, the TPE-LightGBM model proposed in this study not only improves the accuracy of water source identification but also reduces computational complexity, providing effective technical support for the prevention and control of coal mine water hazards.