A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions

Daniel Tschernutter,Mathias Kraus,Stefan Feuerriegel
2024-01-16
Abstract:We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.
Machine Learning,Neural and Evolutionary Computing,Optimization and Control
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the parameter optimization problem of single hidden layer feedforward neural networks (SLFN). Specifically, the paper proposes a globally convergent algorithm (DCON) based on the Difference-of-Convex (DC) representation for optimizing parameters in SLFN. #### Main Contributions: 1. **Global Convergence**: The proposed algorithm is proven to have global convergence in terms of value and limit critical points. 2. **Convergence Speed**: Conditions for the convergence speed of the training loss are provided, indicating that very fast convergence can be achieved under specific conditions. 3. **Performance Comparison**: DCON is compared with state-of-the-art gradient descent optimizers like Adam, showing superior predictive performance on multiple datasets. 4. **No Hyperparameters**: DCON does not require any hyperparameters during training, eliminating the need for common hyperparameters such as learning rate and number of training epochs found in traditional gradient descent methods. #### Method Overview: - **Formulation of the Optimization Problem**: The SLFN parameter optimization problem is defined and described through a regularized loss function. - **Necessary Optimality Conditions**: The necessary optimality conditions of the optimization problem are analyzed, and the concept of limiting subdifferentials is introduced. - **DCON Algorithm**: A block coordinate descent (BCD) method is proposed, combined with an optimization algorithm for DC functions (DCA). The optimization problem for each block can be transformed into a series of convex problems, enabling efficient solutions. ### Conclusion Through theoretical analysis and experimental validation, the paper demonstrates the effectiveness of the proposed DCON algorithm in the parameter optimization task of single hidden layer neural networks. Compared to existing methods, the algorithm not only has global convergence but also shows better performance in practical applications.