Abstract:We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the parameter optimization problem of single hidden layer feedforward neural networks (SLFN). Specifically, the paper proposes a globally convergent algorithm (DCON) based on the Difference-of-Convex (DC) representation for optimizing parameters in SLFN. #### Main Contributions: 1. **Global Convergence**: The proposed algorithm is proven to have global convergence in terms of value and limit critical points. 2. **Convergence Speed**: Conditions for the convergence speed of the training loss are provided, indicating that very fast convergence can be achieved under specific conditions. 3. **Performance Comparison**: DCON is compared with state-of-the-art gradient descent optimizers like Adam, showing superior predictive performance on multiple datasets. 4. **No Hyperparameters**: DCON does not require any hyperparameters during training, eliminating the need for common hyperparameters such as learning rate and number of training epochs found in traditional gradient descent methods. #### Method Overview: - **Formulation of the Optimization Problem**: The SLFN parameter optimization problem is defined and described through a regularized loss function. - **Necessary Optimality Conditions**: The necessary optimality conditions of the optimization problem are analyzed, and the concept of limiting subdifferentials is introduced. - **DCON Algorithm**: A block coordinate descent (BCD) method is proposed, combined with an optimization algorithm for DC functions (DCA). The optimization problem for each block can be transformed into a series of convex problems, enabling efficient solutions. ### Conclusion Through theoretical analysis and experimental validation, the paper demonstrates the effectiveness of the proposed DCON algorithm in the parameter optimization task of single hidden layer neural networks. Compared to existing methods, the algorithm not only has global convergence but also shows better performance in practical applications.

A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions

A Neural Network Transformation based Global Optimization Algorithm

A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method

A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

A Convergent ADMM Framework for Efficient Neural Network Training

An effective algorithm for hyperparameter optimization of neural networks

Convergence Rates of Training Deep Neural Networks Via Alternating Minimization Methods.

Theoretical properties of the global optimizer of two layer neural network

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Block-diagonal Hessian-free Optimization for Training Neural Networks

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

Predict globally, correct locally: Parallel-in-time optimization of neural networks

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

Improved convergence rates for the Difference-of-Convex algorithm

Efficient and provably convergent randomized greedy algorithms for neural network optimization

Training Artificial Neural Networks Using a Global Optimization Method That Utilizes Neural Networks

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Faster Convergence of Local SGD for Over-Parameterized Models

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network