SineKAN: Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions

Eric A. F. Reinhardt,P. R. Dinesh,Sergei Gleyzer
2024-07-23
Abstract:Recent work has established an alternative to traditional multi-layer perceptron neural networks in the form of Kolmogorov-Arnold Networks (KAN). The general KAN framework uses learnable activation functions on the edges of the computational graph followed by summation on nodes. The learnable edge activation functions in the original implementation are basis spline functions (B-Spline). Here, we present a model in which learnable grids of B-Spline activation functions are replaced by grids of re-weighted sine functions. We show that this leads to better or comparable numerical performance to B-Spline KAN models on the MNIST benchmark, while also providing a substantial speed increase on the order of 4-8 times.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Improving the Performance of Kolmogorov-Arnold Networks (KAN)**: KAN is a neural network architecture based on the Kolmogorov-Arnold representation theorem, designed to replace traditional multilayer perceptrons (MLP). However, existing KAN implementations (such as B-SplineKAN, which uses B-spline activation functions) perform well on certain tasks but still lag behind MLPs in terms of speed and performance. Therefore, this paper proposes a new KAN implementation—SineKAN, which uses sine activation functions instead of B-spline activation functions. 2. **Enhancing Model Speed and Accuracy**: By introducing sine activation functions, SineKAN not only demonstrates better accuracy in MNIST benchmarks but also achieves inference speeds 4 to 8 times faster than B-SplineKAN. This indicates that SineKAN has greater practical utility. 3. **Exploring the Advantages of Sine Activation Functions**: Sine activation functions can provide stronger numerical performance and better maintain the stability of output values in deep models, thereby avoiding the issue of value collapse in deep models. Additionally, sine activation functions enable better multilayer scalability. 4. **Optimizing Weight Initialization Strategies**: To further enhance model performance, the paper also explores a new weight initialization strategy to ensure model stability and consistency across different grid sizes. This strategy helps the model maintain good performance at various depths. In summary, this paper aims to improve the overall performance of the KAN architecture by introducing sine activation functions and optimizing weight initialization strategies, making it superior to the existing B-SplineKAN model in terms of speed, accuracy, and stability.