Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning
Yuanyuan Liu,Fanhua Shang,Hongying Liu,Lin Kong,Licheng Jiao,Zhouchen Lin
DOI: https://doi.org/10.1109/tpami.2020.3000512
IF: 23.6
2021-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is $\mathcal {O}(1/T)$O(1/T) as opposed to $\mathcal {O}(1/T^2)$O(1/T2) of accelerated deterministic algorithms, where $T$T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint $Ax + By = c$Ax+By=c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with $B = \tau I$B=τI and $B\ne \tau I$B≠τI, respectively, where $I$I is an identity matrix and $\tau$τ is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from $\mathcal {O}(1/T)$O(1/T) to $\mathcal {O}(1/T^2)$O(1/T2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.