Scalable Transformer for PDE Surrogate Modeling

Zijie Li,Dule Shu,Amir Barati Farimani
2023-11-03
Abstract:Transformer has shown state-of-the-art performance on various applications and has recently emerged as a promising tool for surrogate modeling of partial differential equations (PDEs). Despite the introduction of linear-complexity attention, applying Transformer to problems with a large number of grid points can be numerically unstable and computationally expensive. In this work, we propose Factorized Transformer (FactFormer), which is based on an axial factorized kernel integral. Concretely, we introduce a learnable projection operator that decomposes the input function into multiple sub-functions with one-dimensional domain. These sub-functions are then evaluated and used to compute the instance-based kernel with an axial factorized scheme. We showcase that the proposed model is able to simulate 2D Kolmogorov flow on a $256\times 256$ grid and 3D smoke buoyancy on a $64\times64\times64$ grid with good accuracy and efficiency. The proposed factorized scheme can serve as a computationally efficient low-rank surrogate for the full attention scheme when dealing with multi-dimensional problems.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using the Transformer model for partial differential equation (PDE) surrogate modeling, when dealing with problems with a large number of grid points, the standard Transformer model may have problems of numerical instability and excessively high computational cost. Specifically: 1. **Numerical instability**: As the number of grid points increases, especially on high - resolution grids, stacking multiple attention layers will lead to numerical instability. 2. **High computational cost**: For multi - dimensional problems, the number of grid points grows exponentially with the dimension, resulting in a very large attention matrix and high computational complexity. To solve these problems, the author proposes the Factorized Transformer (FactFormer), which is an improved attention mechanism based on axial decomposition kernel integration. By decomposing the input function into multiple one - dimensional sub - functions and using these sub - functions to calculate the instantiated kernel functions, FactFormer can significantly reduce the computational cost and improve numerical stability while maintaining high precision. This enables the model to effectively handle large - scale grid points in multi - dimensional problems. ### Specific improvement measures - **Axial decomposition kernel integration**: A learning projection operator is introduced to decompose the input function into multiple one - dimensional sub - functions. These sub - functions are used to calculate the kernel functions on each axis, thus avoiding directly dealing with large - scale full - attention matrices. - **Low - rank approximation**: The low - rank structure of the kernel matrix is utilized to reduce the computational complexity, making it suitable for multi - dimensional problems. - **Numerical stability**: By decomposing the attention mechanism, the numerical instability problem that may occur when stacking multiple attention layers on high - resolution grids is avoided. ### Experimental verification The author verifies the effectiveness of FactFormer through multiple benchmark test problems, including: - 2D Kolmogorov flow (256×256 grid) - 3D smoke buoyancy (64×64×64 grid) The experimental results show that FactFormer exhibits good accuracy and efficiency on these problems and has obvious advantages compared with existing methods. ### Summary The main contribution of this paper is to propose a new attention mechanism - FactFormer, which solves the problems of numerical instability and excessively high computational cost encountered by existing Transformer models in PDE surrogate modeling, and provides a new method for efficient and stable PDE simulation.