Abstract:Fourier neural operators (FNOs) are a recently introduced neural network architecture for learning solution operators of partial differential equations (PDEs), which have been shown to perform significantly better than comparable deep learning approaches. Once trained, FNOs can achieve speed-ups of multiple orders of magnitude over conventional numerical PDE solvers. However, due to the high dimensionality of their input data and network weights, FNOs have so far only been applied to two-dimensional or small three-dimensional problems. To remove this limited problem-size barrier, we propose a model-parallel version of FNOs based on domain-decomposition of both the input data and network weights. We demonstrate that our model-parallel FNO is able to predict time-varying PDE solutions of over 2.6 billion variables on Perlmutter using up to 512 A100 GPUs and show an example of training a distributed FNO on the Azure cloud for simulating multiphase CO$_2$ dynamics in the Earth's subsurface.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the computational efficiency and scalability problems encountered when using traditional numerical simulators to solve partial differential equations (PDEs). Specifically, traditional numerical methods such as the finite - difference, finite - volume, or finite - element methods, although highly accurate, require a large amount of computational resources and time when dealing with large - scale, multi - dimensional problems, which limits their practicality in applications that require a large number of simulations, such as uncertainty quantification, inverse problem solving, or numerical optimization. To solve this problem, the paper proposes a model - parallel method based on Fourier neural operators (FNOs) to learn and predict the solutions of large - scale parameterized PDEs. FNOs are a recently introduced neural network architecture for learning the solution operators of PDEs and have been proven to perform better than other deep - learning methods. The trained FNOs can significantly accelerate the solution speed of PDEs during the inference stage, achieving an improvement of several orders of magnitude. However, due to the high - dimensionality of the input data and network weights, FNOs have so far been only applied to two - dimensional or small three - dimensional problems. To break through this limitation, the paper proposes model - parallel FNOs based on domain decomposition, which predicts the time - dependent PDE solutions of more than 2.6 billion variables by distributing the input data and network weights across multiple GPUs. In addition, this method also shows an example of distributed training of FNOs on the Azure cloud platform to simulate the multi - phase CO₂ dynamics underground on the earth. #### Main challenges 1. **Scalability**: Scaling deep surrogate models such as FNOs to the large - scale problem sizes required in practical applications, especially beyond small - scale 2D or 3D time - dependent scenarios. 2. **Memory limitations**: Modern GPU architectures (such as NVIDIA Ampere GPU) cannot provide enough memory to process a single training sample, especially for medium - sized 3D problems (more than 64³ grid points). 3. **Parallel computing**: It is necessary to distribute the network among multiple GPUs to support the training of large - scale 2D and 3D time - dependent problems. #### Solutions The paper proposes to use the domain decomposition method to achieve model parallelism, that is, to partition all tensors (including input, output, weight, and gradient tensors) in the feature dimension (space and time). This method allows the input data, weights, and hidden states to be distributed across multiple worker nodes, making it possible to scale any network and data size. Compared with the existing 3D parallel methods, this method provides more fine - grained tensor partition control and is especially suitable for architectures such as FNOs, which are different from natural language processing (NLP) models. In this way, the paper successfully conducted experiments on the Perlmutter supercomputer using up to 512 A100 GPUs, demonstrating the potential of model - parallel FNOs in handling extremely large - scale PDE problems.

Model-Parallel Fourier Neural Operators as Learned Surrogates for Large-Scale Parametric PDEs

Fourier Neural Operator with Learned Deformations for PDEs on General Geometries

MgFNO: Multi-grid Architecture Fourier Neural Operator for Parametric Partial Differential Equations

Incremental Spatial and Spectral Learning of Neural Operators for Solving Large-Scale PDEs

Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs

Non-equispaced Fourier Neural Solvers for PDEs

A Born Fourier Neural Operator for Solving Poisson’s Equation with Limited Data and Arbitrary Domain Deformation

Augmenting Deep Residual Surrogates with Fourier Neural Operators for Rapid Two-Phase Flow and Transport Simulations

Enhancing Solutions for Complex PDEs: Introducing Complementary Convolution and Equivariant Attention in Fourier Neural Operators

Quantum Fourier Networks for Solving Parametric PDEs

Fourier Neural Operator for Solving Subsurface Oil/Water Two-Phase Flow Partial Differential Equation

Transfer Learning Fourier Neural Operator for Solving Parametric Frequency-Domain Wave Equations

Toward a Better Understanding of Fourier Neural Operators from a Spectral Perspective

Enhancing subsurface multiphase flow simulation with Fourier neural operator

Component Fourier Neural Operator for Singularly Perturbed Differential Equations

Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains

Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations

Gabor-Filtered Fourier Neural Operator for Solving Partial Differential Equations

Fourier Neural Operator Surrogate Model to Predict 3D Seismic Waves Propagation

Fourier neural operators for spatiotemporal dynamics in two-dimensional turbulence