Abstract:The lack of mathematical tractability of Deep Neural Networks (DNNs) has hindered progress towards having a unified convergence analysis of training algorithms, in the general setting. We propose a unified optimization framework for training different types of DNNs, and establish its convergence for arbitrary loss, activation, and regularization functions, assumed to be smooth. We show that framework generalizes well-known first- and second-order training methods, and thus allows us to show the convergence of these methods for various DNN architectures and learning tasks, as a special case of our approach. We discuss some of its applications in training various DNN architectures (e.g., feed-forward, convolutional, linear networks), to regression and classification tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of a unified convergence analysis framework in the training of deep neural networks (DNNs). Specifically, most of the existing DNN training methods lack theoretical convergence guarantees, especially in general settings (i.e., for any loss function, activation function, and regularization function). Although some local search methods (such as stochastic gradient descent and back - propagation) perform well in practice, the reasons for their success have not been fully understood. ### Main problems of the paper 1. **Lack of mathematical tractability**: The optimization problem of DNNs is non - convex and NP - hard, which makes theoretical analysis very difficult. 2. **Limitations of existing methods**: Most of the existing research focuses on convergence analysis in specific scenarios, such as neural networks without hidden layers or two - layer networks under specific assumptions. These methods cannot be directly generalized to more complex DNN architectures and learning tasks. 3. **Need for a general framework**: To address the above challenges, the paper proposes a unified optimization framework, aiming to provide a general training method for different types of DNNs and prove its convergence. ### Solutions The paper proposes a unified optimization framework based on the block coordinate descent (BCD) method, which is used to solve the problem by decomposing the non - convex optimization problem into a series of sub - problems. Specifically: - **Block coordinate descent (BCD)**: Only optimize a part of the variables (i.e., one block) in each iteration while fixing the other parts. - **Block successive upper - bound minimization (BSUM)**: For non - convex sub - problems, approximate optimization is carried out by constructing a convex upper bound, thereby ensuring the convergence of the algorithm. ### Main contributions 1. **General framework**: Applicable to multiple DNN architectures (such as feed - forward networks, convolutional networks, linear networks, etc.) and learning tasks (such as regression and classification). 2. **Convergence guarantee**: Provides convergence proofs for several common training algorithms, including first - order and second - order methods (such as gradient descent, Newton's method). 3. **Simplification in special cases**: When the loss function and activation function satisfy certain conditions, the sub - problems can become convex problems, so that the standard BCD method can be directly used. ### Applications The paper also discusses the application of this framework in various DNN architectures, including but not limited to: - **Regression tasks**: Such as ridge regression and LASSO regularization. - **Classification tasks**: Such as cross - entropy loss and squared hinge loss. - **Convolutional neural networks**: Handle convolution operations by representing the weight matrix as a Toeplitz matrix. - **Linear networks**: Use the identity activation function to make the model a cascade of a series of linear operators. In summary, this paper aims to fill the gap in theoretical analysis in DNN training methods, provide a general and convergence - guaranteed optimization framework, and thus promote the further development of the field of deep learning.

A Unified Framework for Training Neural Networks

A Unifying Framework for Convergence Analysis of Approximate Newton Methods.

A Convergent ADMM Framework for Efficient Neural Network Training

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

A Unified Framework for U-Net Design and Analysis

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

A Unified Kernel for Neural Network Learning

Convergence Rates of Training Deep Neural Networks Via Alternating Minimization Methods.

A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks.

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach

Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Full error analysis for the training of deep neural networks

A unified and constructive framework for the universality of neural networks

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

A Unified Framework for Convolution-Based Graph Neural Networks

A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization

On Convergence of Training Loss Without Reaching Stationary Points

A Unified Analysis of Stochastic Momentum Methods for Deep Learning.