Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling

Xinyu Peng,Li Li,Fei-Yue Wang
DOI: https://doi.org/10.1109/tnnls.2019.2957003
IF: 14.255
2020-11-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Machine learning, especially deep neural networks, has developed rapidly in fields, including computer vision, speech recognition, and reinforcement learning. Although minibatch stochastic gradient descent (SGD) is one of the most popular stochastic optimization methods for training deep networks, it shows a slow convergence rate due to the large noise in the gradient approximation. In this article, we attempt to remedy this problem by building a more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare the convergence properties between the minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex minibatch SGD variants can benefit from the proposed batch selection strategy.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of slow convergence speed in minibatch stochastic gradient descent (minibatch SGD) when training deep neural networks. Specifically: 1. **Background and Motivation**: - Minibatch stochastic gradient descent (minibatch SGD) is one of the most commonly used methods for training deep networks. - Due to the high noise in gradient approximation, its convergence speed is slow. 2. **Proposed Method**: - The authors propose a minibatch selection method based on typicality sampling to improve the accuracy of gradient estimation. - This method reduces gradient estimation error, thereby accelerating the convergence speed. 3. **Theoretical Analysis**: - Theoretically, it is proven that the proposed typical batch SGD algorithm has a faster linear convergence rate and is compared with traditional minibatch SGD. 4. **Experimental Validation**: - Experimental results show that the new sampling strategy outperforms traditional simple random sampling (SRS) on both synthetic and natural datasets, especially demonstrating faster convergence speed in the early stages of training.