A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization.
Pan Zhou,Xiao-Tong Yuan,Zhouchen Lin,Steven C. H. Hoi
DOI: https://doi.org/10.1109/tpami.2021.3087328
IF: 23.6
2022-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of $n$ components, HSDMPG attains an $\epsilon$ -optimization-error within $\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)$ stochastic gradient evaluations, where $\kappa$ is condition number, $\zeta =1$ for quadratic loss and $\zeta =2$ for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when $\epsilon =\mathcal {O}(1/\sqrt{n})$ which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively $\mathcal {O} (n^{0.5}\log ^{2}(n))$ and $\mathcal {O} (n^{0.5}\log ^{3}(n))$ , which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .