New logarithmic step size for stochastic gradient descent

M. Soheil Shamaee,S. Fathi Hafshejani,Z. Saeidian
DOI: https://doi.org/10.1007/s11704-023-3245-z
2024-04-02
Abstract:In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an $O(\frac{1}{\sqrt{T}})$ convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the ~FashionMinst,~ CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by $0.9\%$ for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The paper aims to address the step size selection problem in the Stochastic Gradient Descent (SGD) algorithm, particularly in improving the convergence speed and performance when dealing with smooth non-convex functions. Specifically, the paper proposes a new logarithmic step size strategy, combined with the warm restart technique to improve the SGD algorithm. Through the logarithmic step size strategy, the paper achieves an \(O(\frac{1}{\sqrt{T}})\) convergence rate on smooth non-convex functions and validates the effectiveness of this method through experiments, especially on the FashionMNIST, CIFAR10, and CIFAR100 datasets. Experimental results show that when using the Convolutional Neural Network (CNN) model, the newly proposed logarithmic step size method improves the test accuracy by 0.9% on the CIFAR100 dataset. Therefore, the main objective of this paper is to optimize the convergence speed and performance of the SGD algorithm by introducing a new logarithmic step size strategy, particularly in the context of large-scale non-convex optimization problems.