Abstract:Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of online learning framework. Rigorous analysis is made in this paper over three synthetic complex non-convex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 datasets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet) based Convolutional Neural Networks (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms other optimizers. Also, we show that diffGrad performs uniformly well for training CNN using different activation functions. The source code is made publicly available at <a class="link-external link-https" href="https://github.com/shivram1987/diffGrad" rel="external noopener nofollow">this https URL</a>.

Gradient Descent Optimization in Deep Learning Model Training Based on Multistage and Method Combination Strategy

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Learning Gradient Descent: Better Generalization and Longer Horizons

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Effective Neural Network Training with a New Weighting Mechanism-Based Optimization Algorithm.

Gradient Descent: The Ultimate Optimizer

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

An optimization Strategy for Deep Neural Networks Training

Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent

Optimization Methods in Deep Learning: A Comprehensive Overview

Optimization for deep learning: theory and algorithms

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario

Gradient Descent, Stochastic Optimization, and Other Tales

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

diffGrad: An Optimization Method for Convolutional Neural Networks

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Efficient and stable SAV-based methods for gradient flows arising from deep learning