Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin Transformer

Xun Wang,Zeyang Zhu,Xiangyu Meng,Tao Song
DOI: https://doi.org/10.1109/sc41406.2024.00084
2024-01-01
Abstract:Fourier Neural Operator (FNO) has been proven to be a universal and effective deep learning framework capable of achieving remarkable accuracy on Partial Differential Equation (PDE) solution problem. However, certain key components of emerging FNO-based models cannot leverage hardware potential, which makes it difficult to apply in high resolution and high realtime demand scenario. This paper presents a high optimized model called Speed Galerkin Transformer, including multilevel parallel SliceK-SplitK-ReduceK strategy for batched skinny matrix multiplication, memory layout optimization for QKV matrices and positional encodings and multi-head layer normalization fusion, as well as batched transposition optimization with strided scattering and gathering in 2D FNO, and these strategies can achieve 10.29x, 4.41x and 2.38x speedup respectively under specific configuration. When solving the Darcy Flow equation at 512x512 resolution, the Speed Galerkin Transformer model can achieve about 1.72x speedup, and achieve more than 90% parallel efficiency on 8 GPUs.
What problem does this paper attempt to address?