Successive Affine Learning for Deep Neural Networks

Yuesheng Xu
DOI: https://doi.org/10.48550/arXiv.2305.07996
2023-07-11
Abstract:This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a relatively small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.
Machine Learning,Numerical Analysis,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the non - convex optimization challenges faced during the construction of deep neural networks (DNN). Specifically, traditional methods determine all parameters (weight matrices and bias vectors) of DNN by solving a highly non - convex optimization problem, which makes numerical solution extremely difficult, especially when there are a large number of network layers. To address this challenge, the author proposes a new model - the Successive Affine Learning (SAL) model. The main innovations of the SAL model are as follows: 1. **Avoid non - convex optimization problems**: By decomposing the learning of each layer into a quadratic or convex optimization problem, instead of solving the non - convex optimization problem of the entire network all at once. 2. **Layer - by - layer learning**: Only learn the affine transformation (i.e., weight matrix and bias vector) of the current layer at each step, and apply the activation function after training these parameters. In this way, the optimization problem at each step becomes convex, so that it can be efficiently solved using standard numerical methods (such as Nesterov's algorithm, conjugate gradient method, etc.). 3. **Gradual approximation**: By accumulating multiple simple steps, finally construct a deep neural network with strong expressive power. In addition, the SAL model also solves the common vanishing - gradient problem in traditional DNN training, and is particularly suitable for adaptive approximation tasks. The paper also provides a strict mathematical theoretical basis, proving that the function system generated by the SAL model satisfies the Pythagorean identity and Parseval identity, and establishes the convergence theorem of the SAL process. In summary, this paper aims to overcome the computational obstacles in existing deep - learning methods by proposing the SAL model and provide a more effective and stable method for constructing deep neural networks.