Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation

Sidharth SS,Keerthana AR,Gokul R,Anas KP
2024-06-14
Abstract:Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures, such as Multi-Layer Perceptrons (MLPs), often struggle to efficiently capture intricate patterns and irregularities present in high-dimensional functions. This paper presents the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a new neural network architecture inspired by the Kolmogorov-Arnold representation theorem, incorporating the powerful approximation capabilities of Chebyshev polynomials. By utilizing learnable functions parametrized by Chebyshev polynomials on the network's edges, Chebyshev KANs enhance flexibility, efficiency, and interpretability in function approximation tasks. We demonstrate the efficacy of Chebyshev KANs through experiments on digit classification, synthetic function approximation, and fractal function generation, highlighting their superiority over traditional MLPs in terms of parameter efficiency and interpretability. Our comprehensive evaluation, including ablation studies, confirms the potential of Chebyshev KANs to address longstanding challenges in nonlinear function approximation, paving the way for further advancements in various scientific and engineering applications.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily focuses on addressing the challenge of accurately approximating complex nonlinear functions, especially in high-dimensional spaces. Traditional multilayer perceptrons (MLPs) often require a large number of parameters to achieve high precision when dealing with complex high-dimensional functions, leading to issues of low parameter efficiency and poor interpretability. To tackle these problems, the paper introduces the Chebyshev polynomial basis Kolmogorov-Arnold network (Chebyshev KAN), a novel neural network architecture that combines the powerful approximation capabilities of the Kolmogorov-Arnold representation theorem and Chebyshev polynomials. The main innovation of Chebyshev KAN lies in its use of Chebyshev polynomials to parameterize the learnable functions on the edges of the network, allowing each weight parameter to be transformed into a learnable univariate function. This design enables the network to sum incoming signals at the nodes and apply nonlinear transformations through these learnable functions, thereby enhancing the network's flexibility, efficiency, and interpretability. Compared to traditional MLPs, Chebyshev KAN can achieve higher approximation accuracy with fewer parameters, particularly excelling in handling complex high-dimensional functions. The paper validates the effectiveness of Chebyshev KAN through a series of experiments, including tasks such as handwritten digit classification (MNIST dataset), synthetic function approximation, and fractal function generation, demonstrating its advantages in parameter efficiency, dynamic activation functions, enhanced interpretability, and improved numerical stability and approximation accuracy. Additionally, the paper conducts ablation studies to explore the impact of different initialization methods, normalization techniques, and types of Chebyshev polynomials on model performance, further proving the potential and practicality of Chebyshev KAN in the field of nonlinear function approximation.