Abstract:We consider a class of structured, nonconvex, nonsmooth optimization problems under orthogonality constraints, where the objectives combine a smooth function, a nonsmooth concave function, and a nonsmooth weakly convex function. This class of problems finds diverse applications in statistical learning and data science. Existing ADMMs for addressing these problems often fail to exploit the specific structure of orthogonality constraints, struggle with nonsmooth functions and nonconvex constraint sets, or result in suboptimal oracle complexity. We propose {\sf OADMM}, an Alternating Direction Method of Multipliers (ADMM) designed to solve this class of problems using efficient proximal linearized strategies. Two specific variants of {\sf OADMM} are explored: one based on Euclidean Projection ({\sf OADMM-EP}) and the other on Riemannian retraction ({\sf OADMM-RR}). We integrate a Nesterov extrapolation strategy into {\sf OADMM-EP} and a monotone Barzilai-Borwein strategy into {\sf OADMM-RR} to potentially accelerate primal convergence. Additionally, we adopt an over-relaxation strategy in both {\sf OADMM-EP} and {\sf OADMM-RR} for rapid dual convergence. Under mild assumptions, we prove that {\sf OADMM} converges to the critical point of the problem with a provable convergence rate of $\mathcal{O}(1/\epsilon^{3})$. We also establish the convergence rate of {\sf OADMM} under the Kurdyka-Lojasiewicz (KL) inequality. Numerical experiments are conducted to demonstrate the advantages of the proposed method.
What problem does this paper attempt to address?
This paper attempts to solve a class of non - convex and non - smooth composite optimization problems with orthogonal constraints. The objective function of this type of problem combines smooth functions, non - smooth concave functions and non - smooth weakly convex functions. Specifically, the paper focuses on the following form of optimization problem:
\[
\min_{X \in \mathbb{R}^{n \times r}} F(X) \triangleq f(X) - g(X) + h(A(X)), \quad \text{s.t.} \quad X^T X = I_r
\]
where:
- \(n \geq r\)
- \(A(X) \in \mathbb{R}^{m \times 1}\)
- \(I_r\) is the \(r \times r\) identity matrix
- \(X^T X = I_r\) represents the orthogonal constraint
### Problem Background
This type of optimization problem has a wide range of applications in statistical learning and data science, such as sparse principal component analysis (PCA), deep neural networks, orthogonal non - negative matrix factorization, range - based independent component analysis and dictionary learning, etc.
### Shortcomings of Existing Methods
The existing alternating direction method of multipliers (ADMM) has the following shortcomings when dealing with these problems:
1. **Failure to fully utilize the specific structure of the orthogonal constraint**: Many methods do not fully utilize the characteristics of the orthogonal constraint, resulting in low efficiency.
2. **Difficulty in handling non - smooth functions and non - convex constraint sets**: Existing methods often perform poorly when handling non - smooth functions and non - convex constraint sets.
3. **Sub - optimal oracle complexity**: The results of some methods may not be optimal, resulting in a slow convergence rate.
### Contributions of the Paper
To overcome the above problems, the paper proposes two variants of the OADMM (Orthogonal ADMM) algorithm:
1. **OADMM - EP**: A method based on Euclidean projection.
2. **OADMM - RR**: A method based on Riemannian retraction.
### Main Features
- **Acceleration strategy**: The Nesterov extrapolation strategy is introduced in OADMM - EP, and the monotonic Barzilai - Borwein step - size strategy is adopted in OADMM - RR to accelerate the convergence of the original variables.
- **Fast dual convergence**: The convergence of the dual variables is accelerated through the over - relaxation strategy.
- **Convergence proof**: Under mild assumptions, it is proved that OADMM converges to the critical point of the problem with an oracle complexity of \(O(1/\epsilon^3)\), and the convergence rate is established under the Kurdyka - Łojasiewicz inequality.
### Applications and Experiments
The paper verifies the effectiveness of OADMM - EP and OADMM - RR through numerical experiments on the sparse PCA problem. The experimental results show that these two methods are generally superior to other state - of - the - art optimization algorithms, such as RADMM, SPGM - EP, SPGM - RR and Sub - Grad, in terms of the objective function value.
### Conclusion
The paper proposes an alternating direction method of multipliers (OADMM) specifically for non - smooth composite optimization problems with orthogonal constraints, and verifies its effectiveness and superiority through theoretical analysis and numerical experiments.