Early Warning System for Seismic Events in Coal Mines Using Machine Learning

Robert Bogucki,Jan Lasek,Jan Kanty Milczek,Michal Tadeusiak
DOI: https://doi.org/10.48550/arXiv.1609.06957
2016-09-21
Abstract:This document describes an approach to the problem of predicting dangerous seismic events in active coal mines up to 8 hours in advance. It was developed as a part of the AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines. The solutions presented consist of ensembles of various predictive models trained on different sets of features. The best one achieved a winning score of 0.939 AUC.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict potential dangerous seismic events in coal mines. Specifically, the authors have developed a machine - learning - based early - warning system, aiming to predict 8 hours in advance whether the seismic energy early - warning level (i.e., the total seismic energy exceeding 50,000 joules) will be reached in the coal mine. This problem originated from a specific task in the 2016 AAIA Data Mining Challenge: predicting dangerous seismic events in active coal mines. ### Problem Background In Poland, in 2015, the mining industry reported 2,158 dangerous accidents, resulting in 19 deaths and 12 serious injuries. Underground mining work faces multiple threats, including fire, methane leakage, as well as earthquakes and shock waves. Monitoring and decision - support systems may play an important role in reducing the number of accidents and preventing accidents. These systems are usually based on machine - learning or data - mining techniques and can effectively reduce the risk to employees and prevent equipment losses. ### Specific Problem Statement The goal of the paper is to develop a classification model to predict, based on the records of the past 24 hours, whether an event with seismic energy reaching the early - warning level will occur within the next 8 hours. The early - warning level is defined as the total seismic energy exceeding 50,000 joules (50 kJ). The accuracy of the model is evaluated by the area under the receiver operating characteristic curve (AUC), and the formula for calculating AUC is as follows: \[ \text{AUC}(f,X) = \frac{\sum_{i:y_i = 0} \sum_{j:y_j = 1} 1(f(x_i) < f(x_j))}{|\{y_i : y_i = 0\}| \cdot |\{y_j : y_j = 1\}|} \] where: - \( (x_i, y_i) \in X \) represents an instance in the data set \( X \), - \( x_i \) is the feature vector related to a single measurement, - \( y_i \in \{0, 1\} \) is its label, - \( f \) is a model that maps each instance to the probability of belonging to class "1" (or, more generally, a real - valued risk score), - \( 1(\cdot) \) is an indicator function that returns 1 if the given condition is met, and 0 otherwise, - \( |S| \) represents the cardinality of the set \( S \). The value range of AUC is [0, 1], with 1 representing a perfect predictor and a random predictor scoring approximately 0.5. ### Data Description The training data set contains 133,151 observations, each of which is described by 541 features. The test data set contains 3,860 unlabeled observations. The data is divided into two categories: - **Training Data Set**: Contains labels and is used to train the model. - **Test Data Set**: Does not contain labels and is used to evaluate the model performance. The features in the data set include 13 different types of features and 22 time series, covering the measurement data within the past 24 hours. In addition, it also includes four evaluation indicators provided by experts, as well as some general features and metadata. ### Challenges The main challenge lies in developing a prediction model that can be generalized to new locations. The early - warning frequencies vary greatly at different locations, and the locations in the training set and the test set are also not exactly the same. Therefore, the model needs to be robust to location and time to ensure the prediction accuracy in different environments. From the above description, it can be seen that the core problem of this paper is to use machine - learning methods to predict in advance the time window in which dangerous seismic events may occur in coal mines, thereby providing effective early - warning and ensuring the safety of miners.