Randomized Block-Coordinate Adaptive Algorithms for Nonconvex Optimization Problems
Yangfan Zhou,Kaizhu Huang,Jiang Li,Cheng,Xuguang Wang,Amir Hussian,Xin Liu
DOI: https://doi.org/10.1016/j.engappai.2023.105968
IF: 8
2023-01-01
Engineering Applications of Artificial Intelligence
Abstract:Nonconvex optimization problems have always been one focus in deep learning, in which many fast adaptive algorithms based on momentum are applied. However, the full gradient computation of high-dimensional feature vector in the above tasks become prohibitive. To reduce the computation cost for optimizers on nonconvex optimization problems typically seen in deep learning, this work proposes a randomized block-coordinate adaptive optimization algorithm, named RAda, which randomly picks a block from the full coordinates of the parameter vector and then sparsely computes its gradient. We prove that RAda converges to a δ-accurate solution with the stochastic first-order complexity of O(1/δ2), where δ is the upper bound of the gradient’s square, under nonconvex cases. Experiments on public datasets including CIFAR-10, CIFAR-100, and Penn TreeBank, verify that RAda outperforms the other compared algorithms in terms of the computational cost.