Varun Madhavan,Amal S Sebastian,Bharath Ramsundar,Venkatasubramanian Viswanathan
Abstract:In this work, we describe a novel approach to building a neural PDE solver leveraging recent advances in transformer based neural network architectures. Our model can provide solutions for different values of PDE parameters without any need for retraining the network. The training is carried out in a self-supervised manner, similar to pretraining approaches applied in language and vision tasks. We hypothesize that the model is in effect learning a family of operators (for multiple parameters) mapping the initial condition to the solution of the PDE at any future time step t. We compare this approach with the Fourier Neural Operator (FNO), and demonstrate that it can generalize over the space of PDE parameters, despite having a higher prediction error for individual parameter values compared to the FNO. We show that performance on a specific parameter can be improved by finetuning the model with very small amounts of data. We also demonstrate that the model scales with data as well as model size.
What problem does this paper attempt to address?
This paper proposes a new approach that utilizes the Transformer architecture to construct a neural network solver for partial differential equations (PDEs). Traditional methods can be computationally expensive when dealing with real-time and large-scale applications due to various factors such as the size of the discrete system, physical complexity, and solver selection. In recent years, machine learning-based PDE solvers have been able to provide similar accuracy at faster speeds, but they are still limited to solving for fixed parameter values.
The main issue addressed in this paper is solving the existing problem where models cannot accurately provide solutions for PDE parameters outside of the training distribution without retraining the network from scratch. In this research, the authors employ a self-supervised pre-training method similar to those used in language and vision tasks, assuming that the model is effectively learning a mapping operation from the initial conditions to the future time-step solutions of the PDE.
Compared to Fourier Neural Operator (FNO), although this model has higher prediction errors for individual parameter values, it can generalize better to the PDE parameter space. By fine-tuning with a small amount of specific parameter data, the model's performance on specific parameters can be improved. Additionally, the study found that the model's performance improves with increasing data volume and model size, which aligns with the general trend of the Transformer architecture.
In the experimental section, the authors demonstrate the performance of the model on multiple systems such as one-dimensional transport equation, one-dimensional viscous Burgers equation, and two-dimensional compressible Navier-Stokes equation, and compare it with FNO. The results show that although the average error is higher, PDE Transformer is able to generalize to a wide range of parameters and can significantly improve prediction performance for new parameters through fine-tuning. However, for certain systems like the one-dimensional transport equation and the two-dimensional compressible Navier-Stokes equation, the model's performance in the "out-of-domain" settings is not ideal, possibly due to the accumulation of errors in autoregressive prediction and underfitting issues.
In conclusion, the problem addressed in this paper is to create a neural PDE solver that can generalize to different PDE parameters. By employing pre-training and fine-tuning strategies, the model is able to quickly adapt to new parameters when needed, reducing the need for extensive retraining data.