Fully Projection-Free Proximal Stochastic Gradient Method with Optimal Convergence Rates
Yan Li,Xiaofeng Cao,Honghui Chen
DOI: https://doi.org/10.1109/access.2020.3019885
IF: 3.9
2020-01-01
IEEE Access
Abstract:Proximal stochastic gradient plays an important role in large-scale machine learning and big data analysis. It needs to iteratively update models within a feasible set until convergence. The computational cost is usually high due to the projection over the feasible set. To reduce complexity, many projection-free methods such as Frank-Wolfe methods have been proposed. However, those projection-free methods have to solve a linear programming problem for every update of models which still leads to high computational cost for a complex feasible set, and can be unbearable in practical scenarios. Motivated by this problem, we propose a fully projection-free proximal stochastic gradient method, which has two advantages over previous methods. First, it enjoys high efficiency. The proposed method does not conduct projection directly but finds an approximately correct projection point with a very low computational cost. Second, it achieves tight and optimal convergence rates. Our theoretical analysis shows that the proposed method achieves convergence rates of O(1/√T) and O(log T/T) for convex and strongly convex functions, respectively. These convergence rates successfully match with the known lower bounds. Therefore, in this paper, we provide a valuable insight that some loss of accuracy of projection can improve the efficiency significantly, but does not impair convergence rates. Finally, empirical studies show that the proposed method achieves more than 5× speedup than previous methods.