Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling
Xinyu Peng,Li Li,Fei-Yue Wang
DOI: https://doi.org/10.1109/tnnls.2019.2957003
IF: 14.255
2020-11-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Machine learning, especially deep neural networks, has developed rapidly in fields, including computer vision, speech recognition, and reinforcement learning. Although minibatch stochastic gradient descent (SGD) is one of the most popular stochastic optimization methods for training deep networks, it shows a slow convergence rate due to the large noise in the gradient approximation. In this article, we attempt to remedy this problem by building a more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare the convergence properties between the minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex minibatch SGD variants can benefit from the proposed batch selection strategy.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture