Wuyang Chen,Jialin Song,Pu Ren,Shashank Subramanian,Dmitriy Morozov,Michael W. Mahoney
Abstract:Recent years have witnessed the promise of coupling machine learning methods and physical domain-specific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding these expensive simulations. In this work, seeking data efficiency, we design unsupervised pretraining for PDE operator learning. To reduce the need for training data with heavy simulation costs, we mine unlabeled PDE data without simulated solutions, and we pretrain neural operators with physics-inspired reconstruction-based proxy tasks. To improve out-of-distribution performance, we further assist neural operators in flexibly leveraging a similarity-based method that learns in-context examples, without incurring extra training costs or designs. Extensive empirical evaluations on a diverse set of PDEs demonstrate that our method is highly data-efficient, more generalizable, and even outperforms conventional vision-pretrained models. We provide our code at <a class="link-external link-https" href="https://github.com/delta-lab-ai/data_efficient_nopt" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the data - efficiency problem in solving partial differential equations (PDEs) in scientific machine learning (SciML). Specifically, although existing neural - network - based methods have shown potential in solving PDEs, these methods usually require a large amount of data for training, which leads to high computational costs because high - fidelity numerical simulations are very expensive. Therefore, the paper proposes a new framework to reduce the need for a large amount of labeled data through unsupervised pre - training and in - context learning (ICL), thereby improving data efficiency and reducing the simulation cost of PDE solutions.
### Main Contributions
1. **Unsupervised Pre - training**: The paper introduces the concept of unsupervised pre - training, using unlabeled PDE data for pre - training. By designing two physics - based reconstruction tasks (Masked Autoencoder and Super - resolution), the model can learn useful feature representations without relying on expensive simulation data. Experimental results show that this unsupervised pre - training method can significantly reduce the amount of required simulation data while improving model performance.
2. **In - context Learning**: To further improve the generalization ability of the model in out - of - distribution (OOD) situations, the paper proposes a similarity - based in - context learning method. This method utilizes a small number of context examples (demos) in the inference stage, calculates the similarity between the input and the examples, and aggregates the solutions of these examples to improve the prediction performance of the model. This method significantly improves the OOD generalization ability of the model without requiring additional training costs.
### Method Overview
1. **Unsupervised Pre - training**:
- **Unlabeled PDE Data**: Defines unlabeled PDE data, that is, only contains input such as physical parameters, coordinates, forcing functions, etc., but does not contain the solution of PDE.
- **Surrogate Tasks**: Designs two surrogate tasks - Masked Autoencoder and Super - resolution. Through these tasks, the model can learn invariance to sparse sensing and different resolutions, thereby extracting useful feature representations.
2. **In - context Learning**:
- **Similarity Calculation**: Calculates the distance between the input and the context examples to find similar samples.
- **Aggregation**: For each query location, aggregates the solutions of its similar samples as the final prediction.
### Experimental Results
The paper has carried out extensive experiments on multiple PDE benchmark tests and actual observation data. The experimental results show that the unsupervised pre - training method not only significantly reduces the amount of required simulation data but also outperforms the model trained from scratch in performance. In addition, the in - context learning method significantly improves the generalization ability of the model in out - of - distribution situations.
### Formula Examples
- **General Form of PDE**:
\[
\sum_{i,j = 1}^{n}a_{ij}(x)\frac{\partial^{2}u}{\partial x_{i}\partial x_{j}}+\sum_{i = 1}^{n}b_{i}(x)\frac{\partial u}{\partial x_{i}}+c(x)u = f(x)
\]
where \(x\in\mathbb{R}^{n}\) represents the physical space, \(a_{ij}, b_{i}, c\) are known physical parameters, \(u\) is the target solution, and \(f\) is the external forcing function.
- **Loss Function**:
- **Masked Autoencoder**:
\[
\mathcal{L}_{\text{MAE}}=\frac{1}{|M|}\sum_{(i,j)\in M}\|\hat{y}_{ij}-y_{ij}\|^{2}
\]
where \(M\) is the set of masked regions, \(\hat{y}_{ij}\) is the predicted value of the model, and \(y_{ij}\) is the true value.
- **Super - resol