Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

Katharina Flügel,Daniel Coquelin,Marie Weiel,Achim Streit,Markus Götz
2024-10-23
Abstract:The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically implausible. Forward gradients are an approach to approximate the gradients from directional derivatives along random tangents computed by forward-mode automatic differentiation. So far, research has focused on using a single tangent per step. This paper provides an in-depth analysis of multi-tangent forward gradients and introduces an improved approach to combining the forward gradients from multiple tangents based on orthogonal projections. We demonstrate that increasing the number of tangents improves both approximation quality and optimization performance across various tasks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to optimize the training of neural networks through multi - tangent forward gradients, so as to overcome the disadvantages of backpropagation such as high computational cost, difficulty in parallelization, and biological implausibility.** Specifically, although backpropagation can efficiently obtain accurate gradients, it has the following problems: 1. **High computational cost**: The time complexity of backpropagation is approximately twice that of the forward pass, which takes up a large amount of training time and consumes a great deal of energy. 2. **Difficult to parallelize**: The dependency relationships in backpropagation lead to sub - optimal memory access patterns, which hinder parallelization. 3. **Biologically implausible**: There is no similar reverse path in biological neural networks to transmit update information. To solve these problems, the paper proposes a forward - gradient method based on multi - tangents. Forward gradients approximate gradients by calculating directional derivatives along random tangent vectors through forward automatic differentiation, thus avoiding the above - mentioned problems of backpropagation. However, existing research mainly focuses on using a single tangent vector, while this paper deeply analyzes multi - tangent forward gradients and introduces an improved method based on orthogonal projection to combine multiple forward gradients. Research shows that increasing the number of tangent vectors can improve the approximation quality and optimization performance. ### Main research questions The paper aims to answer the following research questions: 1. **RQ1**: Can using multiple tangent vectors improve forward gradients? 2. **RQ2**: How to combine forward - gradient information from multiple tangent vectors? 3. **RQ3**: Can multi - tangent forward gradients be extended to state - of - the - art architectures? 4. **RQ4**: What are the trade - offs of using multiple tangent vectors? ### Solutions The methods proposed in the paper include: - Using multiple random tangent vectors to approximate gradients, thereby improving the approximation quality. - Introducing an orthogonal projection method to combine multiple forward gradients to reduce errors and improve accuracy. - Verifying the performance of multi - tangent forward gradients in different tasks through experiments, including optimizing closed - form functions and training neural networks. Through these methods, the paper demonstrates the potential of multi - tangent forward gradients in improving approximation quality and optimization performance, and provides a new direction for further research.