Abstract:With the growing interest and applications in machine learning and data science, finding an efficient method to sparse analysis the high-dimensional data and optimizing a dimension reduction model to extract lower dimensional features has becoming more and more important. Orthogonal constraints (Stiefel manifold) is a commonly met constraint in these applications, and the sparsity is usually enforced through the element-wise L1 norm. Many applications can be found on optimization over Stiefel manifold within the area of physics and machine learning. In this paper, we propose a novel idea by tackling the Stiefel manifold through an nonlinear eigen-approach by first using ADMM to split the problem into smooth optimization over manifold and convex non-smooth optimization, and then transforming the former into the form of nonlinear eigenvalue problem with eigenvector dependency (NEPv) which is solved by self-consistent field (SCF) iteration, and the latter can be found to have an closed-form solution through proximal gradient. Compared with existing methods, our proposed algorithm takes the advantage of specific structure of the objective function, and has efficient convergence results under mild assumptions.
What problem does this paper attempt to address?
This paper attempts to solve the problem of sparse optimization on the Stiefel manifold (i.e., orthogonal constraints), especially for the optimization of dimensionality reduction models for high - dimensional data. Specifically, the paper proposes a new method to deal with non - smooth composite minimization problems with orthogonal constraints. Its goal is to find an effective method for sparse analysis of high - dimensional data and extract low - dimensional features.
### Problem Description
The main optimization problem discussed in the paper is:
\[
\min_{X} f(X)+r(X) \quad \text{s.t.} \quad X \in S_{n,p}
\]
where:
- \( S_{n,p}=\{X \in \mathbb{R}^{n \times p} \mid X^{\top}X = I_p\} \) represents the Stiefel manifold.
- \( f:\mathbb{R}^{n \times p}\to\mathbb{R} \) is a differentiable but possibly non - convex function, and its gradient \( \nabla f(X) \) satisfies Lipschitz continuity.
- \( r:\mathbb{R}^{n \times p}\to\mathbb{R} \) is a convex but possibly non - smooth function, usually using the element - wise \( \ell_1 \) norm to enforce sparsity.
### Limitations of Existing Methods
Existing methods are mainly divided into two categories:
1. **Riemannian algorithms**: such as the Riemannian sub - gradient method and MADMM (Manifold Alternating Direction Method of Multipliers). These methods are mainly used to handle smooth objective functions, but they are not effective for non - smooth objective functions.
2. **Lagrange multiplier methods**: such as PAMAL (Proximal Alternating Minimized Augmented Lagrangian). By introducing an indicator function, the minimization problem with orthogonal constraints is transformed into an unconditional minimization problem. However, these methods are more sensitive in parameter settings and have high computational complexity.
### Proposed New Method
The paper proposes a new method based on the nonlinear eigenvalue problem (NEPv) and the alternating direction multiplier method (ADMM), called NEPv ADMM. The main steps of this method are as follows:
1. **Variable splitting**: Decompose the original problem into two sub - problems, one is a smooth optimization problem on the manifold, and the other is a simple convex optimization problem.
2. **ADMM framework**: Solve these two sub - problems respectively through the ADMM framework.
3. **NEPv transformation**: Transform the smooth optimization problem on the manifold into a nonlinear eigenvalue problem and solve it using the self - consistent field (SCF) iteration.
4. **Proximal gradient method**: For the convex optimization sub - problem, use the proximal gradient method to solve the closed - form solution.
### Main Contributions
1. **Efficiency**: The new method makes full use of the structural characteristics of the objective function and has high convergence efficiency.
2. **Applicability**: It is applicable to various applications in machine learning and data science, such as sparse principal component analysis (sPCA), orthogonal dictionary learning (ODL), compression patterns in physics, etc.
3. **Innovation**: For the first time, NEPv and ADMM are combined and applied to non - smooth optimization problems with orthogonal constraints.
### Conclusion
The NEPv ADMM method proposed in the paper shows significant advantages in dealing with non - smooth optimization problems with orthogonal constraints, especially in terms of sparsity and dimensionality reduction. Future work will further study the convergence properties of this method and extend it to a wider range of objective function types.