Abstract:Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are as follows: In neural network training, the Backpropagation (BP) method has problems such as high computational complexity, biological implausibility, and bottlenecks in hardware acceleration. Specifically: 1. **Computational Complexity and Memory Consumption**: The BP method requires a large amount of computational resources, and as the network depth increases, the computational complexity and memory consumption increase exponentially. 2. **Biological Implausibility**: The BP method assumes that neurons can transmit precise gradient information, which is biologically unrealistic because actual neurons cannot implement such a complex signal - transmission mechanism. 3. **Hardware Acceleration Bottleneck**: The implementation of the BP method on dedicated hardware is limited by problems such as update locking and weight transport, which affect the training efficiency. To solve these problems, the author proposes an approximation technique based on the Likelihood Ratio (LR) method to reduce computational and memory requirements and improve the efficiency of neural network training. Specifically, this method solves the problems in the following ways: - **Approximate LR Method**: Use sign encoding to approximate the gradient estimation in the LR method, thereby significantly reducing memory consumption and allowing more data copies to be used for more accurate gradient estimation. - **Parallel Strategy**: Further accelerate neural network training by making full use of data - level and level - parallelism. - **Pipeline Strategy**: Combine the pipelining of the forward and backward propagation processes to optimize the computational flow, which is especially suitable for efficient computation on dedicated hardware. These improvements make the LR method have higher performance and scalability in large - scale neural network training while maintaining training effects comparable to those of the BP method.

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

One Forward is Enough for Neural Network Training via Likelihood Ratio Method

A New Likelihood Ratio Method for Training Artificial Neural Networks

Efficient Neural Network Training Via Forward and Backward Propagation Sparsification

Learning likelihood ratios with neural network classifiers

Training Artificial Neural Networks by Generalized Likelihood Ratio Method: an Effective Way to Improve Robustness.

Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

An efficient likelihood-free Bayesian inference method based on sequential neural posterior estimation

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Accelerated Linearized Laplace Approximation for Bayesian Deep Learning

Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Feed-Forward Optimization With Delayed Feedback for Neural Networks

Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation

Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

Direct Amortized Likelihood Ratio Estimation

2BP: 2-Stage Backpropagation

Memorized Sparse Backpropagation.

ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks