A Comparison of Algorithms for Learning with Nonconvex Regularization
Quanming Yao,Yongqi Zhang
2016-01-01
Abstract:Convex regularizers is popular for sparse and low-rank learning, mainly due to their nice statistical and optimization guarantee. However, they often lead to bias estimation, thus the sparsity and accuracy is not as good as desired. This motivates replacement of convex regularizers with nonconvex one, and recently many nonconvex regularizers have been proposed and indeed better performance than convex ones can be achieved. In this survey, we summary and compare algorithms which can be applied on learning with nonconvex regularizers. 1 General Non-Convex Optimization Consider general non-convex optimization problem 1 min x g(x) (1) Assumption 1 (Difference of Convex [1, 18, 14]). g(x) can be decomposed as difference of two convex (DoC) functions, i.e. g(x) = ĝ(x)− ḡ(x). We are more interested in non-smooth g(x), otherwise, (1) can be solve simply with gradient descent. With Assumption 1, (1) can be expressed as min x {g(x) ≡ ĝ(x)− ḡ(x)} where ĝ(x) and ḡ(x) are two convex functions. Definition 1 (Critical point [1]). Under Assumption 1, a point x∗ is a critical point of g(x), if it satisfies 0 ∈ ∂ĝ(x∗)− ∂ḡ(x∗). Remark. For smooth functions, critical point is defined as point where its gradient is zero. Difference of Convex programming (DCP) [1] is a general technique to optimize (1). Basically, it uses first order expansion to approximate ḡ(x), and generates {xt, yt} by xt+1 = arg min x ĝ(x)− 〈x− xt, yt〉 , yt ∈ ∂ḡ(xt) (2) Though g(x) is not convex, we can see the sub-problem in (2) becomes convex. Therefore, algorithms for convex optimization can be applied. However, the sub-problem may not be easy to solve, and this makes DCP slow in real world applications. This method is used in [4, 19] for sparse coding with non-convex regularizers. Theorem 1.1 (Convergence of DCP). Let Assumption 1 hold, then xt generated by (2) will converge to a critical point of (1). All functions mentioned in this paper are assumed bounded from below