Does $\ell _{p}$ -Minimization Outperform $\ell _{1}$ -Minimization?
Le Zheng,Arian Maleki,Haolei Weng,Xiaodong Wang,Teng Long
DOI: https://doi.org/10.1109/tit.2017.2717585
IF: 2.5
2017-01-01
IEEE Transactions on Information Theory
Abstract:In many application areas ranging from bioinformatics to imaging, we are faced with the, following question: can we recover a sparse vector x(o) is an element of R-N from its undersampled set of noisy observations y is an element of R-n = Ax(o)-w. The last decade has witnessed a surge of algorithms and theoretical results to address this question. One of the most popular schemes is the l(p)-regularized least squares given by the following formulation:(x) over cap(gamma, p) is an element of arg min(x) (1/2)|| y - Ax ||(2)(2) + gamma ||x||(P)(p), where p is an element of [0, 1]. Among these optimization problems, the case p = 1, also known as LASSO, is the best accepted in practice, for the following two reasons. First, thanks to the extensive studies performed in the fields of high-dimensional statistics and compressed sensing, we have a clear picture of LASSO's pertbrmance. Second, it is convex and efficient algorithms exist for finding its global minima. Unfortunately, neither of the above two properties hold for 0 <= p < 1. However, they are still appealing because of the following folklores in the high dimensional statistics. First, <(x)over cap>(gamma, p)) is closer to xo than (x) over cap(gamma, p). Second, if we employ iterative methods that aim to converge to a local minima of arg min(x) (1/2)|| y - Ax ||(2)(2) + gamma ||x||(p)(P), then under good initialization, these algorithms converge to a solution that is still closer to x(o) than (x) over cap(gamma, 1). In spite of the existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish them has been very limited. This paper aims to study the above-mentioned folklore theorems and establish their scope of validity. Starting with approximate message passing (AMP) algorithm as a heuristic method for solving l(p)-regularized least squares, we study the following questions. First, what is the impact of initialization on the performance of the algorithm? Second, when does the algorithm recover the sparse signal xo under a "good" initialization? Third, when does the algorithm converge to the sparse signal regardless of the initialization? Studying these questions will not only shed light on the second folklore theorem, hut also lead us to the answer the first one, i.e., the performance of the global optima (x) over cap(gamma, p). For that purpose, we employ the replica analysis(1) to show the connection between the solution of AMP and (x) over cap(gamma, p) in the asymptotic settings. This enables us to compare the accuracy of (x) over cap(gamma, p) and (x) over cap(gamma, p). In particular, we will present an accurate characterization of the phase transition and noise sensitivity of l(p)-regularized least squares for every 0 <= p < 1. Our results in the noiseless setting confirm that l(p)-regularized least squares (if gamma is tuned optimally) exhibits the same phase transition for every 0 <= p < 1 and this phase transition is much better than that of LASSO. Furthermore, we show that in the noisy setting, there is a major difference between the performance of l(p)-regularized least squares with different values of p. For instance, we will show that for very small and very large measurement noises, p = 0 and p = I outperform the other values of p, respectively.