SeismoGen: Seismic Waveform Synthesis Using Generative Adversarial Networks

Tiantong Wang,Daniel Trugman,Youzuo Lin
DOI: https://doi.org/10.48550/arXiv.1911.03966
2020-05-03
Abstract:Detecting earthquake events from seismic time series has proved itself a challenging task. Manual detection can be expensive and tedious due to the intensive labor and large scale data set. In recent years, automatic detection methods based on machine learning have been developed to improve accuracy and efficiency. However, the accuracy of those methods relies on a sufficient amount of high-quality training data, which itself can be expensive to obtain due to the requirement of domain knowledge and subject matter expertise. This paper is to resolve this dilemma by answering two questions: (1) provided with a limited number of reliable labels, can we use them to generate more synthetic labels; (2) Can we use those synthetic labels to improve the detectability? Among all the existing generative models, the generative adversarial network (GAN) shows its supreme capability in generating high-quality synthetic samples in multiple domains. We designed our model based on GAN. In particular, we studied several different network structures. By comparing the generated results, our GAN-based generative model yields the highest quality. We further combine the dataset with synthetic samples generated by our generative model and show that the detectability of our earthquake classification model is significantly improved than the one trained without augmenting the training set.
Machine Learning,Geophysics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Can additional and realistic synthetic data be generated in the case of limited reliable labeled data?** 2. **Can these synthetic data sets be used to further optimize earthquake detection algorithms?** Specifically, the paper aims to solve these problems in the following ways: - **Develop a Generative Adversarial Network (GAN) model**: This model can generate realistic three - dimensional seismic waveform data, including event - type and noise - type. \[ \min_G \max_D V(D, G, x, z)=\mathbb{E}_{x \sim p_{data}}[\log (D(x))]+\mathbb{E}_{z \sim p_z}[\log (1 - D(G(z)))] \] - **Verify the quality of synthetic waveforms**: Verify through visual and quantitative methods, and evaluate using a machine - learning - based earthquake classifier. - **Enhance real earthquake data**: Use the generated synthetic waveforms to expand the actual earthquake data set to improve machine - learning - based earthquake detection methods. ### Background problems Seismic waveform detection is a fundamental task in seismology. However, traditional manual detection methods require a large amount of manpower and are difficult to scale to large - scale data sets. In recent years, automatic detection methods based on machine learning have been developed, but the accuracy of these methods depends on a large amount of high - quality labeled training data. Since obtaining a large amount of labeled data is costly and time - consuming, researchers have explored the possibility of using generative models (such as GAN) to generate synthetic data to alleviate this problem. ### Solutions The author adopts a Conditional Generative Adversarial Network (Conditional GAN), and its value function is shown as follows: \[ \min_G \max_D V(D, G, x, z)=\mathbb{E}_{x \sim p_{data}}[\log (D(x|y))]+\mathbb{E}_{z \sim p_z}[\log (1 - D(G(z|\hat{y})|\hat{y}))] \] where \(y\) is the label of the real sample \(x\), and \(\hat{y}\) is the target label of the synthetic sample. In this way, the generator can generate specific types of seismic waveform data according to the given category. ### Experimental verification To verify the effectiveness of the model, the author designed four experiments: 1. **Visual comparison**: Compare the generated synthetic samples with the baseline model. 2. **Classification task evaluation**: Evaluate the quality of synthetic samples through classification tasks. 3. **Robustness test under a small - scale training set**: Study the performance of the model under limited training data. 4. **Data enhancement application**: Apply the generative model to actual earthquake detection tasks to verify its effectiveness. Through these experiments, the author shows that the generated synthetic data can not only improve the performance of earthquake detection algorithms, but also provide an effective data enhancement means when the labeled data is limited.