Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Zeliang Zhang,Jinyang Jiang,Zhuo Liu,Susan Liang,Yijie Peng,Chenliang Xu
2024-03-19
Abstract:Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are as follows: In neural network training, the Backpropagation (BP) method has problems such as high computational complexity, biological implausibility, and bottlenecks in hardware acceleration. Specifically: 1. **Computational Complexity and Memory Consumption**: The BP method requires a large amount of computational resources, and as the network depth increases, the computational complexity and memory consumption increase exponentially. 2. **Biological Implausibility**: The BP method assumes that neurons can transmit precise gradient information, which is biologically unrealistic because actual neurons cannot implement such a complex signal - transmission mechanism. 3. **Hardware Acceleration Bottleneck**: The implementation of the BP method on dedicated hardware is limited by problems such as update locking and weight transport, which affect the training efficiency. To solve these problems, the author proposes an approximation technique based on the Likelihood Ratio (LR) method to reduce computational and memory requirements and improve the efficiency of neural network training. Specifically, this method solves the problems in the following ways: - **Approximate LR Method**: Use sign encoding to approximate the gradient estimation in the LR method, thereby significantly reducing memory consumption and allowing more data copies to be used for more accurate gradient estimation. - **Parallel Strategy**: Further accelerate neural network training by making full use of data - level and level - parallelism. - **Pipeline Strategy**: Combine the pipelining of the forward and backward propagation processes to optimize the computational flow, which is especially suitable for efficient computation on dedicated hardware. These improvements make the LR method have higher performance and scalability in large - scale neural network training while maintaining training effects comparable to those of the BP method.