Reducing Bias in Deep Learning Optimization: The RSGDM Approach

Honglin Qin,Hongye Zheng,Bingxing Wang,Zhizhong Wu,Bingyao Liu,Yuanfang Yang

2024-09-06

Abstract:Currently, widely used first-order deep learning optimizers include non-adaptive learning rate optimizers and adaptive learning rate optimizers. The former is represented by SGDM (Stochastic Gradient Descent with Momentum), while the latter is represented by Adam. Both of these methods use exponential moving averages to estimate the overall gradient. However, estimating the overall gradient using exponential moving averages is biased and has a lag. This paper proposes an RSGDM algorithm based on differential correction. Our contributions are mainly threefold: 1) Analyze the bias and lag brought by the exponential moving average in the SGDM algorithm. 2) Use the differential estimation term to correct the bias and lag in the SGDM algorithm, proposing the RSGDM algorithm. 3) Experiments on the CIFAR datasets have proven that our RSGDM algorithm is superior to the SGDM algorithm in terms of convergence accuracy.

Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the common issues of bias and lag in deep learning optimization. Specifically, widely used deep learning optimizers such as SGDM (Stochastic Gradient Descent with Momentum) and Adam use exponential moving averages to estimate the overall gradient. However, this method has bias and lag, which affect the convergence speed and accuracy during the optimization process. To solve these problems, the paper proposes a new algorithm based on differential correction—RSGDM (Reduced Bias Stochastic Gradient Descent with Momentum). ### Main Contributions: 1. **Analysis of Bias and Lag**: A detailed analysis of the bias and lag issues caused by the use of exponential moving averages in the SGDM algorithm. 2. **Proposing the RSGDM Algorithm**: By introducing a differential estimation term to correct the bias and lag in the SGDM algorithm, the RSGDM algorithm is proposed. 3. **Experimental Validation**: Experiments were conducted on the CIFAR-10 and CIFAR-100 datasets, demonstrating that the RSGDM algorithm outperforms the traditional SGDM algorithm in terms of convergence accuracy. ### Experimental Results: - On the CIFAR-10 dataset, the test accuracy of RSGDM is 0.14% higher than that of SGDM. - On the CIFAR-100 dataset, the test accuracy of RSGDM is 0.57% higher than that of SGDM. These results indicate that the RSGDM algorithm has achieved significant effects in reducing bias and lag, thereby improving the training effectiveness and generalization ability of deep learning models.

Reducing Bias in Deep Learning Optimization: The RSGDM Approach

A New Adaptive Gradient Method with Gradient Decomposition

An Adaptive Learning Rate Schedule for SIGNSGD Optimizer in Neural Networks

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

Adaptive Learning Rates with Maximum Variation Averaging.

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Optimal Adaptive and Accelerated Stochastic Gradient Descent

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

An Attempt of Applying the Lagrange-type 1-Step-ahead Numerical Differentiation Method to Optimize the SGD Algorithm in Deep Learning

A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference

A Randomized Block-Coordinate Adam online learning optimization algorithm

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Adaptive Gradient Method with Resilience and Momentum

AdaDiff: Adaptive Gradient Descent with the Differential of Gradient

Fvr-Sgd: A New Flexible Variance-Reduction Method For Sgd On Large-Scale Datasets

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses