Making Invisible Visible: Data-Driven Seismic Inversion with Spatio-temporally Constrained Data Augmentation

Yuxin Yang,Xitong Zhang,Qiang Guan,Youzuo Lin
DOI: https://doi.org/10.1109/TGRS.2022.3144636
2022-02-07
Abstract:Deep learning and data-driven approaches have shown great potential in scientific domains. The promise of data-driven techniques relies on the availability of a large volume of high-quality training datasets. Due to the high cost of obtaining data through expensive physical experiments, instruments, and simulations, data augmentation techniques for scientific applications have emerged as a new direction for obtaining scientific data recently. However, existing data augmentation techniques originating from computer vision, yield physically unacceptable data samples that are not helpful for the domain problems that we are interested in. In this paper, we develop new data augmentation techniques based on convolutional neural networks. Specifically, our generative models leverage different physics knowledge (such as governing equations, observable perception, and physics phenomena) to improve the quality of the synthetic data. To validate the effectiveness of our data augmentation techniques, we apply them to solve a subsurface seismic full-waveform inversion using simulated CO$_2$ leakage data. Our interest is to invert for subsurface velocity models associated with very small CO$_2$ leakage. We validate the performance of our methods using comprehensive numerical tests. Via comparison and analysis, we show that data-driven seismic imaging can be significantly enhanced by using our data augmentation techniques. Particularly, the imaging quality has been improved by 15% in test scenarios of general-sized leakage and 17% in small-sized leakage when using an augmented training set obtained with our techniques.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of poor model generalization ability and sub - optimal imaging quality in seismic full - waveform inversion (FWI) due to insufficient data. Specifically, the author focuses on using data - driven methods to enhance the detection ability of small - scale underground carbon dioxide leakage. The following are the core issues of this study: 1. **Data scarcity problem**: - Seismic FWI requires a large amount of high - quality training data, but the cost of actually obtaining this data is very high. Especially in the application scenario of monitoring carbon dioxide leakage, there are great practical obstacles in data collection and simulation. - Especially for small - scale leakage, the amount of data is more limited, which makes the deep - learning - based seismic FWI method face the "small - data" challenge during training. 2. **Limitations of existing data augmentation techniques**: - Existing data augmentation techniques mainly originate from the field of computer vision. The data samples generated by these techniques often do not conform to physical laws and cannot be effectively applied to scientific problems, especially tasks that need to consider spatio - temporal characteristics and physical phenomena such as seismic imaging. - These techniques fail to fully consider the spatial and temporal characteristics of seismic data and related physical phenomena, so the generated data may not be suitable for practical applications. 3. **Improving the accuracy of data - driven seismic imaging**: - In order to improve the performance of data - driven seismic imaging methods, especially for the detection of small - scale carbon dioxide leakage, new data augmentation techniques need to be developed to generate high - quality and physically - compliant synthetic data. - By improving the quality and quantity of training data, the precision of seismic imaging can be significantly improved, thereby better detecting and locating small - scale carbon dioxide leakage. ### Overview of the solution To solve the above problems, the author proposes a new data augmentation technique based on convolutional neural networks, which combines different physical knowledge (such as control equations, observable perception, and physical phenomena) to generate high - quality synthetic data. Specific methods include: - **Autoencoder**: Used to generate interpolation data for missing points in time, ensuring that the generated data is physically reasonable. - **Variational Autoencoder (VAE)**: Generate new samples with high diversity by manipulating the latent representation, and combine perception loss to further constrain the generation model, ensuring that the generated data is not only numerically close to the real data but also visually similar. - **Spatio - temporal constrained generation model**: Combine spatio - temporal information and physical phenomena to ensure that the generated data conforms to physical laws, thereby improving the representativeness of data and imaging quality. Through these methods, the author shows that their data augmentation technique can significantly improve the performance of data - driven seismic imaging methods, especially in detecting small - scale carbon dioxide leakage. Experimental results show that using the enhanced training set, the imaging quality is improved by 15% in the general - scale leakage test scenario and 17% in the small - scale leakage test scenario.