Use of Multifidelity Training Data and Transfer Learning for Efficient Construction of Subsurface Flow Surrogate Models

Su Jiang,Louis J. Durlofsky
DOI: https://doi.org/10.1016/j.jcp.2022.111800
2022-04-24
Abstract:Data assimilation presents computational challenges because many high-fidelity models must be simulated. Various deep-learning-based surrogate modeling techniques have been developed to reduce the simulation costs associated with these applications. However, to construct data-driven surrogate models, several thousand high-fidelity simulation runs may be required to provide training samples, and these computations can make training prohibitively expensive. To address this issue, in this work we present a framework where most of the training simulations are performed on coarsened geomodels. These models are constructed using a flow-based upscaling method. The framework entails the use of a transfer-learning procedure, incorporated within an existing recurrent residual U-Net architecture, in which network training is accomplished in three steps. In the first step. where the bulk of the training is performed, only low-fidelity simulation results are used. The second and third steps, in which the output layer is trained and the overall network is fine-tuned, require a relatively small number of high-fidelity simulations. Here we use 2500 low-fidelity runs and 200 high-fidelity runs, which leads to about a 90% reduction in training simulation costs. The method is applied for two-phase subsurface flow in 3D channelized systems, with flow driven by wells. The surrogate model trained with multifidelity data is shown to be nearly as accurate as a reference surrogate trained with only high-fidelity data in predicting dynamic pressure and saturation fields in new geomodels. Importantly, the network provides results that are significantly more accurate than the low-fidelity simulations used for most of the training. The multifidelity surrogate is also applied for history matching using an ensemble-based procedure, where accuracy relative to reference results is again demonstrated.
Machine Learning,Geophysics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in underground fluid flow simulation, data assimilation/history matching faces huge computational challenges because a large number of high - fidelity model simulations are required. To reduce the simulation cost in these applications, a variety of deep - learning - based surrogate model techniques have been developed. However, constructing data - driven surrogate models usually requires thousands of high - fidelity simulation runs to provide training samples, which makes the training cost very high. To solve this problem, this paper proposes a framework in which most of the training simulations are carried out on coarsened low - fidelity geological models. These models are constructed by a flow - based upscaling method. This framework combines a transfer learning procedure and realizes network training in the existing recurrent residual U - Net architecture. Specifically, the training process is divided into three steps: 1. **Step 1**: The main training phase, using only low - fidelity simulation results. 2. **Step 2**: Training the output layer, using a small number of high - fidelity simulation results. 3. **Step 3**: Fine - tuning the entire network, further using a small number of high - fidelity simulation results. By this method, the authors can significantly reduce the number of high - fidelity simulations required for training while maintaining high accuracy, thereby reducing the computational cost. Specifically, using 2,500 low - fidelity runs and 200 high - fidelity runs can achieve about 90% reduction in training simulation cost. In addition, this method is applied to two - phase underground fluid flow in a three - dimensional channelized system and demonstrates its accuracy in predicting dynamic pressure and saturation fields in new geological models. Compared with the reference surrogate model trained only with high - fidelity data, the multi - fidelity data - trained surrogate model is almost equally accurate. More importantly, the results provided by the network are more accurate than the low - fidelity simulation results used for most of the training. Finally, this multi - fidelity surrogate model is also applied to ensemble - based history matching and demonstrates its accuracy relative to the reference results. ### Formula Summary 1. **Darcy Velocity Formula**: \[ u_j = -\frac{k k_{rj}(S_j)}{\mu_j(p_j)} (\nabla p_j - \rho_j g \nabla z), \quad j = o, w \] where \( k \) is the absolute permeability tensor, \( k_{rj} \) is the relative permeability, \( \mu_j \) is the viscosity, \( p_j \) is the pressure, \( g \) is the gravitational acceleration, and \( z \) is the depth. 2. **Well Flow Rate Formula**: \[ (q_w_j)_i = W I_i \left( \frac{k_{rj} \rho_j}{\mu_j} \right)_i (p_i - p_{w,i}) \] where \( W I_i \) is the well index, defined as: \[ W I_i = \frac{2 \pi k_i \Delta z}{\ln \left( \frac{r_0}{r_w} \right)} \] 3. **Loss Function**: \[ \theta^* = \arg \min_{\theta} \left[ \frac{1}{nsmp} \frac{1}{nt} \sum_{i=1}^{nsmp} \sum_{t=1}^{nt} \| \hat{x}_{h,t}^i - x_{h,t}^i \|_2^2 + \lambda_w \frac{1}{nsmp} \frac{1}{nt} \frac{1}{nw} \sum_{i=1}^{nsmp} \sum_{t=1}^{nt} \sum_{w=1}^{nw} \| \hat{x}