Abstract:This paper introduces an enhanced variant of the Adam optimizer—the BGE-Adam optimization algorithm—that integrates three innovative technologies to augment the adaptability, convergence, and robustness of the original algorithm under various training conditions. Firstly, the BGE-Adam algorithm incorporates a dynamic β parameter adjustment mechanism that utilizes the rate of gradient variations to dynamically adjust the exponential decay rates of the first and second moment estimates (β1 and β2), the adjustment of β1 and β2 is symmetrical, which means that the rules that the algorithm considers when adjusting β1 and β2 are the same. This design helps to maintain the consistency and balance of the algorithm, allowing the optimization algorithm to adaptively capture the trending movements of gradients. Secondly, it estimates the direction of future gradients by a simple gradient prediction model, combining historic gradient information with the current gradient. Lastly, entropy weighting is integrated into the gradient update step. This strategy enhances the model's exploratory nature by introducing a certain amount of noise, thereby improving its adaptability to complex loss surfaces. Experimental results on classical datasets, MNIST and CIFAR10, and gastrointestinal disease medical datasets demonstrate that the BGE-Adam algorithm has improved convergence and generalization capabilities. In particular, on the specific medical image gastrointestinal disease test dataset, the BGE-Adam optimization algorithm achieved an accuracy of 69.36%, a significant improvement over the 67.66% accuracy attained using the standard Adam algorithm; on the CIFAR10 test dataset, the accuracy of the BGE-Adam algorithm reached 71.4%, which is higher than the 70.65% accuracy of the Adam optimization algorithm; and on the MNIST dataset, the BGE-Adam algorithm's accuracy was 99.34%, surpassing the Adam optimization algorithm's accuracy of 99.23%. The BGE-Adam optimization algorithm exhibits better convergence and robustness. This research not only demonstrates the effectiveness of the combination of these three technologies but also provides new perspectives for the future development of deep learning optimization algorithms.

What problem does this paper attempt to address?

The paper primarily focuses on improving the Adam optimization algorithm to enhance its performance in training deep learning models. Specifically, the researchers propose an enhanced version of the Adam optimization algorithm—BGE-Adam, which enhances the adaptability, convergence, and robustness of the original Adam algorithm through three innovative techniques: 1. **Dynamic β Parameter Adjustment Mechanism**: This mechanism dynamically adjusts the exponential decay rates of the first and second moment estimates (β1 and β2) based on the rate of change of the gradient, allowing the algorithm to more flexibly capture the trend of gradient changes. 2. **Gradient Prediction Model**: By combining historical gradient information with the current gradient, this model simply predicts the direction of future gradients. This helps in adjusting the parameter update strategy in advance, reducing the possibility of over-updating, and increasing the stability of the training process. 3. **Entropy Weighting**: Entropy weighting is introduced in the gradient update step. By adding a certain amount of random noise in the parameter updates, the model's ability to explore complex loss surfaces is enhanced, improving its adaptability to complex loss surfaces. Experimental results show that the BGE-Adam algorithm demonstrates better convergence speed and generalization ability compared to the standard Adam algorithm on classic datasets such as MNIST and CIFAR10, as well as medical image datasets. For example, on a specific medical image gastrointestinal disease test dataset, the accuracy of the BGE-Adam algorithm reached 69.36%, significantly higher than the 67.66% of the standard Adam algorithm; on the CIFAR10 test dataset, the accuracy reached 71.4%, higher than the 70.65% of the Adam algorithm; on the MNIST dataset, the accuracy was 99.34%, surpassing the 99.23% of the Adam algorithm. In summary, this paper aims to improve the Adam optimization algorithm through the aforementioned three technical innovations, enhancing its performance under different training conditions. These improvements not only validate the effectiveness of the proposed techniques but also provide new perspectives for the future development of deep learning optimization algorithms.

An Improved BGE-Adam Optimization Algorithm Based on Entropy Weighting and Adaptive Gradient Strategy

An Improvement of Adam Based on a Cyclic Exponential Decay Learning Rate and Gradient Norm Constraints

An Improved Adam Optimization Algorithm Combining Adaptive Coefficients and Composite Gradients Based on Randomized Block Coordinate Descent

BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization

Effective Neural Network Training with a New Weighting Mechanism-Based Optimization Algorithm.

A modification of adaptive moment estimation (adam) for machine learning

A modified Adam algorithm for deep neural network optimization

WarpAdam: A new Adam optimizer based on Meta-Learning approach

AMAdam: adaptive modifier of Adam method

Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers

Effectiveness of Optimization Algorithms in Deep Image Classification

A Novel Optimization Algorithm Combing Gbest-Guided Artificial Bee Colony Algorithm With Variable Gradients

ABNGrad: adaptive step size gradient descent for optimizing neural networks

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

Normalized Direction-preserving Adam.

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization

GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization

CAdam: Confidence-Based Optimization for Online Learning