A Unified Framework for Training Neural Networks

Hadi Ghauch,Hossein Shokri-Ghadikolaei,Carlo Fischione,Mikael Skoglund
DOI: https://doi.org/10.48550/arXiv.1805.09214
2018-05-23
Abstract:The lack of mathematical tractability of Deep Neural Networks (DNNs) has hindered progress towards having a unified convergence analysis of training algorithms, in the general setting. We propose a unified optimization framework for training different types of DNNs, and establish its convergence for arbitrary loss, activation, and regularization functions, assumed to be smooth. We show that framework generalizes well-known first- and second-order training methods, and thus allows us to show the convergence of these methods for various DNN architectures and learning tasks, as a special case of our approach. We discuss some of its applications in training various DNN architectures (e.g., feed-forward, convolutional, linear networks), to regression and classification tasks.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of a unified convergence analysis framework in the training of deep neural networks (DNNs). Specifically, most of the existing DNN training methods lack theoretical convergence guarantees, especially in general settings (i.e., for any loss function, activation function, and regularization function). Although some local search methods (such as stochastic gradient descent and back - propagation) perform well in practice, the reasons for their success have not been fully understood. ### Main problems of the paper 1. **Lack of mathematical tractability**: The optimization problem of DNNs is non - convex and NP - hard, which makes theoretical analysis very difficult. 2. **Limitations of existing methods**: Most of the existing research focuses on convergence analysis in specific scenarios, such as neural networks without hidden layers or two - layer networks under specific assumptions. These methods cannot be directly generalized to more complex DNN architectures and learning tasks. 3. **Need for a general framework**: To address the above challenges, the paper proposes a unified optimization framework, aiming to provide a general training method for different types of DNNs and prove its convergence. ### Solutions The paper proposes a unified optimization framework based on the block coordinate descent (BCD) method, which is used to solve the problem by decomposing the non - convex optimization problem into a series of sub - problems. Specifically: - **Block coordinate descent (BCD)**: Only optimize a part of the variables (i.e., one block) in each iteration while fixing the other parts. - **Block successive upper - bound minimization (BSUM)**: For non - convex sub - problems, approximate optimization is carried out by constructing a convex upper bound, thereby ensuring the convergence of the algorithm. ### Main contributions 1. **General framework**: Applicable to multiple DNN architectures (such as feed - forward networks, convolutional networks, linear networks, etc.) and learning tasks (such as regression and classification). 2. **Convergence guarantee**: Provides convergence proofs for several common training algorithms, including first - order and second - order methods (such as gradient descent, Newton's method). 3. **Simplification in special cases**: When the loss function and activation function satisfy certain conditions, the sub - problems can become convex problems, so that the standard BCD method can be directly used. ### Applications The paper also discusses the application of this framework in various DNN architectures, including but not limited to: - **Regression tasks**: Such as ridge regression and LASSO regularization. - **Classification tasks**: Such as cross - entropy loss and squared hinge loss. - **Convolutional neural networks**: Handle convolution operations by representing the weight matrix as a Toeplitz matrix. - **Linear networks**: Use the identity activation function to make the model a cascade of a series of linear operators. In summary, this paper aims to fill the gap in theoretical analysis in DNN training methods, provide a general and convergence - guaranteed optimization framework, and thus promote the further development of the field of deep learning.