Abstract:Coordinate descent algorithms are widely used in machine learning and large-scale data analysis due to their strong optimality guarantees and impressive empirical performance in solving non-convex problems. In this work, we introduce Block Coordinate Descent (BCD) method for structured nonconvex optimization with nonseparable constraints. Unlike traditional large-scale Coordinate Descent (CD) approaches, we do not assume the constraints are separable. Instead, we account for the possibility of nonlinear coupling among them. By leveraging the inherent problem structure, we propose new CD methods to tackle this specific challenge. Under the relatively mild condition of locally bounded non-convexity, we demonstrate that achieving coordinate-wise stationary points offer a stronger optimality criterion compared to standard critical points. Furthermore, under the Luo-Tseng error bound conditions, our BCD methods exhibit Q-linear convergence to coordinate-wise stationary points or critical points. To demonstrate the practical utility of our methods, we apply them to various machine learning and signal processing models. We also provide the geometry analysis for the models. Experiments on real-world data consistently demonstrate the superior objective values of our approaches compared to existing methods.

What problem does this paper attempt to address?

This paper attempts to address two major challenges in non - convex optimization problems: non - convexity and non - separable constraints. Specifically, the author introduced a new Block Coordinate Descent (BCD) method to handle structured non - convex optimization problems with non - separable constraints. Unlike traditional large - scale coordinate descent methods, this method does not assume that the constraints are separable, but takes into account the nonlinear coupling between the constraints. ### Core Problems of the Paper 1. **Non - convexity**: Non - convex optimization problems are a crucial part of training models in machine learning because non - convex performance can more accurately capture complex prediction problems. However, due to their NP - hard nature, these problems are notoriously difficult to solve. 2. **Non - separable constraints**: Traditional methods usually assume that the constraints are separable, which simplifies the problem - solving process. But in practical applications, the constraints of many problems are non - separable, and a new method is required to handle this situation. ### Specific Objectives - Propose a BCD algorithm applicable to non - convex composite optimization problems and non - separable constraints to provide better solutions than existing methods. - Theoretically prove the optimality of the proposed method and show that the stationary point in the coordinate direction is also a critical point. - For the first time, study the convergence rate of such problems and establish a Q - linear convergence rate. - Provide geometric analysis and experimentally verify the superior performance of the new method on real - data. ### Main Contributions 1. **Propose a new BCD algorithm**: For non - convex optimization problems with non - separable constraints, two new BCD methods are proposed. 2. **Theoretical analysis**: Prove that the stationary point in the coordinate direction is also a critical point and study the convergence rate of such problems for the first time. 3. **Acceleration strategy**: Introduce a breakpoint search strategy and two semi - greedy index selection strategies to accelerate the BCD method and improve computational efficiency. 4. **Geometric analysis**: Provide geometric analysis for three application scenarios. 5. **Experimental verification**: Through extensive experiments, show that the new method is superior to the existing full - gradient algorithms. ### Application Examples The paper discusses four specific examples of optimization frameworks: 1. **Sparse Index Tracking (SIT)**: Used for asset selection and capital allocation. 2. **Non - negative Sparse PCA (NNSPCA)**: Extends the traditional PCA by adding non - negativity and sparsity constraints. 3. **DC Penalized Binary Optimization (DCPB)**: Used to handle optimization problems with binary structures. 4. **Other variants of binary optimization problems**: Transform binary constraints into problems in the continuous domain through different variational restatement methods. Through these examples, the paper demonstrates the wide applicability and effectiveness of the proposed BCD method in practical applications.

Block Coordinate Descent Methods for Structured Nonconvex Optimization with Nonseparable Constraints: Optimality Conditions and Global Convergence

On the Flexibility of Block Coordinate Descent for Large-Scale Optimization.

Efficiency of Coordinate Descent Methods For Structured Nonconvex Optimization

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Block Coordinate Descent Methods for Optimization under J-Orthogonality Constraints with Applications

Randomized block coordinate descent method for linear ill-posed problems

Robust Block Coordinate Descent

Extended ADMM and BCD for Nonseparable Convex Minimization Models with Quadratic Coupling Terms: Convergence Analysis and Insights

A Flexible Coordinate Descent Method

Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses.

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems

Block Coordinate Descent Only Converge to Minimizers

Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly

A Cyclic Coordinate Descent Method for Convex Optimization on Polytopes

Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning

A Hybrid Method of Combinatorial Search and Coordinate Descent for Discrete Optimization

On convergence of the block Lanczos method for the CDT subproblem

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

On the Efficiency of Random Permutation for ADMM and Coordinate Descent