Successive Affine Learning for Deep Neural Networks

Yuesheng Xu

DOI: https://doi.org/10.48550/arXiv.2305.07996

2023-07-11

Abstract:This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a relatively small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.

Machine Learning,Numerical Analysis,Optimization and Control

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the non - convex optimization challenges faced during the construction of deep neural networks (DNN). Specifically, traditional methods determine all parameters (weight matrices and bias vectors) of DNN by solving a highly non - convex optimization problem, which makes numerical solution extremely difficult, especially when there are a large number of network layers. To address this challenge, the author proposes a new model - the Successive Affine Learning (SAL) model. The main innovations of the SAL model are as follows: 1. **Avoid non - convex optimization problems**: By decomposing the learning of each layer into a quadratic or convex optimization problem, instead of solving the non - convex optimization problem of the entire network all at once. 2. **Layer - by - layer learning**: Only learn the affine transformation (i.e., weight matrix and bias vector) of the current layer at each step, and apply the activation function after training these parameters. In this way, the optimization problem at each step becomes convex, so that it can be efficiently solved using standard numerical methods (such as Nesterov's algorithm, conjugate gradient method, etc.). 3. **Gradual approximation**: By accumulating multiple simple steps, finally construct a deep neural network with strong expressive power. In addition, the SAL model also solves the common vanishing - gradient problem in traditional DNN training, and is particularly suitable for adaptive approximation tasks. The paper also provides a strict mathematical theoretical basis, proving that the function system generated by the SAL model satisfies the Pythagorean identity and Parseval identity, and establishes the convergence theorem of the SAL process. In summary, this paper aims to overcome the computational obstacles in existing deep - learning methods by proposing the SAL model and provide a more effective and stable method for constructing deep neural networks.

Successive Affine Learning for Deep Neural Networks

New Dynamical Optimal Learning for Linear Multilayer FNN

An Efficient Learning Algorithm for Direct Training Deep Spiking Neural Networks

Effective Active Learning Method for Spiking Neural Networks.

Dynamical Optimal Learning For Fnn And Its Applications

HybridSNN: Combining Bio-Machine Strengths by Boosting Adaptive Spiking Neural Networks.

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

An Analytic End-to-End Deep Learning Algorithm based on Collaborative Learning

Differentiable Neural Architecture Learning for Efficient Neural Network Design

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent

Differentiable neural architecture learning for efficient neural networks

Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Nonlinear Collaborative Scheme for Deep Neural Networks.

Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization

On ADMM in Deep Learning: Convergence and Saturation-Avoidance

A Fast Learning Algorithm for Deep Belief Nets

Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization