SPIKANs: Separable Physics-Informed Kolmogorov-Arnold Networks

Bruno Jacob,Amanda A. Howard,Panos Stinis
2024-11-10
Abstract:Physics-Informed Neural Networks (PINNs) have emerged as a promising method for solving partial differential equations (PDEs) in scientific computing. While PINNs typically use multilayer perceptrons (MLPs) as their underlying architecture, recent advancements have explored alternative neural network structures. One such innovation is the Kolmogorov-Arnold Network (KAN), which has demonstrated benefits over traditional MLPs, including faster neural scaling and better interpretability. The application of KANs to physics-informed learning has led to the development of Physics-Informed KANs (PIKANs), enabling the use of KANs to solve PDEs. However, despite their advantages, KANs often suffer from slower training speeds, particularly in higher-dimensional problems where the number of collocation points grows exponentially with the dimensionality of the system. To address this challenge, we introduce Separable Physics-Informed Kolmogorov-Arnold Networks (SPIKANs). This novel architecture applies the principle of separation of variables to PIKANs, decomposing the problem such that each dimension is handled by an individual KAN. This approach drastically reduces the computational complexity of training without sacrificing accuracy, facilitating their application to higher-dimensional PDEs. Through a series of benchmark problems, we demonstrate the effectiveness of SPIKANs, showcasing their superior scalability and performance compared to PIKANs and highlighting their potential for solving complex, high-dimensional PDEs in scientific computing.
Machine Learning,Numerical Analysis
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the computational cost and efficiency issues encountered by traditional Physics - Informed Neural Networks (PINNs) when solving high - dimensional partial differential equations (PDEs). Specifically, although Kolmogorov - Arnold Networks (KANs) are superior to traditional Multi - Layer Perceptrons (MLPs) in terms of interpretability and robustness, in high - dimensional problems, the number of training points grows exponentially with the dimension, resulting in slow training speed and high computational complexity. To solve these problems, the author introduced a new architecture - Separable Physics - Informed Kolmogorov - Arnold Networks (SPIKANs). This new architecture significantly reduces computational complexity and memory usage by decomposing a multivariate function into a product of multiple univariate functions. The main features of SPIKANs include: 1. **Separation of Variables Method**: By the method of separation of variables, the high - dimensional PDE problem is decomposed into multiple low - dimensional problems, and each dimension is processed by an independent KAN. 2. **Reduction of Computational Complexity**: Compared with the original KAN, SPIKANs can greatly reduce the computational resources and time required for training. 3. **Improvement of Scalability**: SPIKANs show better scalability and performance when dealing with higher - dimensional PDEs. Through a series of benchmark tests, the paper shows the superior performance of SPIKANs in terms of accuracy and training speed compared with PIKANs and other methods. This makes SPIKANs an effective tool for solving complex, high - dimensional PDE problems. ### Formula Summary - **Approximate Form of KAN**: \[ u(x_1,\dots,x_n)=\sum_{i_{L - 1}=1}^{n_{L - 1}}\phi_{L - 1,i_L,i_{L - 1}}\left(\sum_{i_{L - 2}=1}^{n_{L - 2}}\cdots\sum_{i_0 = 1}^{n_0}\phi_{0,i_1,i_0}(x_{i_0})\right) \] - **Physics - Informed Loss Function**: \[ L(\theta)=\lambda_{\text{pde}}L_{\text{pde}}(\theta)+\lambda_{\text{ic}}L_{\text{ic}}(\theta)+\lambda_{\text{bc}}L_{\text{bc}}(\theta) \] where: - \(L_{\text{pde}}(\theta)=\frac{1}{N_{\text{pde}}}\sum_{i = 1}^{N_{\text{pde}}}\left\|D[\hat{u}(x_i^{\text{pde}},t_i^{\text{pde}};\theta)]-f(x_i^{\text{pde}},t_i^{\text{pde}})\right\|^2\) - \(L_{\text{ic}}(\theta)=\frac{1}{N_{\text{ic}}}\sum_{i = 1}^{N_{\text{ic}}}\left\|\hat{u}(x_i^{\text{ic}},0;\theta)-u_0(x_i^{\text{ic}})\right\|^2\) - \(L_{\text{bc}}(\theta)=\frac{1}{N_{\text{bc}}}\sum_{i = 1}^{N_{\text{bc}}}\left\|\hat{u}(x_i^{\text{bc}},t_i^{\text{bc}},\theta)-u(x_i^{\text{bc}},t_i^{\text{bc}})\right\|^2\)