On the computation of the gradient in implicit neural networks

Béla J. Szekeres,Ferenc Izsák
DOI: https://doi.org/10.1007/s11227-024-06117-6
IF: 3.3
2024-04-25
The Journal of Supercomputing
Abstract:Abstract Implicit neural networks and the related deep equilibrium models are investigated. To train these networks, the gradient of the corresponding loss function should be computed. Bypassing the implicit function theorem, we develop an explicit representation of this quantity, which leads to an easily accessible computational algorithm. The theoretical findings are also supported by numerical simulations.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the gradient calculation problems in Implicit Neural Networks (INNs) and related Deep Equilibrium Models (DEQs). Specifically, in order to train these networks, the gradients of the loss function need to be calculated. The traditional approach relies on the Implicit Function Theorem, which can be rather complex and inefficient in practical applications. This paper proposes a new theoretical method that bypasses the Implicit Function Theorem and provides an easy - to - implement algorithm for calculating gradients. This method has been proven theoretically and also verified through numerical simulations, demonstrating its effectiveness and practicality. ### Key issues 1. **Gradient calculation**: How to efficiently calculate the gradients in implicit neural networks to support gradient - based optimization methods? 2. **Theoretical basis**: How to bypass the Implicit Function Theorem and provide a more direct gradient calculation method? 3. **Numerical verification**: Verify the effectiveness of the new method through numerical experiments. ### Background and motivation Implicit neural networks and deep equilibrium models are important advances in the field of deep learning in recent years. These models reach an equilibrium state under certain stability conditions by increasing the number of network layers. However, calculating this equilibrium state and its corresponding gradients is a complex task. The traditional gradient calculation method relies on the Implicit Function Theorem, which may lead to low computational efficiency and implementation difficulties in practical applications. Therefore, this paper proposes a new method aimed at simplifying the gradient calculation process and improving training efficiency. ### Main contributions 1. **New theoretical method**: Proposed a new method that does not rely on the Implicit Function Theorem for calculating gradients in implicit neural networks. 2. **Easy - to - implement algorithm**: Developed an efficient algorithm that can be conveniently applied to practical problems. 3. **Numerical verification**: Verified the effectiveness of the new method through numerical simulations, providing empirical support. ### Numerical experiments To verify the effectiveness of the new method, the author conducted experiments on two datasets: 1. **HTRU2 dataset**: Used for pulsar classification, it compresses and reconstructs multi - dimensional data through auto - encoders to identify normal and abnormal signals. 2. **NSL - KDD dataset**: Used for network intrusion detection, also using auto - encoders for data processing. The experimental results show that the new method performs excellently on multiple evaluation metrics, especially in terms of recall, which is particularly important for pulsar detection. ### Conclusion The gradient calculation method proposed in this paper is not only innovative in theory but also performs well in practical applications. By bypassing the Implicit Function Theorem, it provides a simple and efficient gradient calculation algorithm, providing strong support for the training of implicit neural networks and deep equilibrium models.