Abstract:While much progress has been achieved over the last decades in neuro-inspired machine learning, there are still fundamental theoretical problems in gradient-based learning using combinations of neurons. These problems, such as saddle points and suboptimal plateaus of the cost function, can lead in theory and practice to failures of learning. In addition, the discrete step size selection of the gradient is problematic since too large steps can lead to instability and too small steps slow down the learning. This paper describes an alternative discrete MinMax learning approach for continuous piece-wise linear functions. Global exponential convergence of the algorithm is established using Contraction Theory with Inequality Constraints, which is extended from the continuous to the discrete case in this paper: The parametrization of each linear function piece is, in contrast to deep learning, linear in the proposed MinMax network. This allows a linear regression stability proof as long as measurements do not transit from one linear region to its neighbouring linear region. The step size of the discrete gradient descent is Lagrangian limited orthogonal to the edge of two neighbouring linear functions. It will be shown that this Lagrangian step limitation does not decrease the convergence of the unconstrained system dynamics in contrast to a step size limitation in the direction of the gradient. We show that the convergence rate of a constrained piece-wise linear function learning is equivalent to the exponential convergence rates of the individual local linear regions.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on several theoretical and practical difficulties encountered by the gradient descent method in neural network learning. Specifically, these problems include: 1. **Saddle Points and Sub - optimal Plateaus**: When using gradient descent for learning, it may get trapped in saddle points or sub - optimal plateaus, leading to learning failure. 2. **Discrete Step - length Selection Problem**: Improper selection of the discrete step - length in gradient descent may lead to instability (too large a step - length) or an overly slow learning speed (too small a step - length). To solve these problems, the author proposes a new discrete Min - Max learning method, specifically for continuous piecewise - linear functions. The following are the main contributions of the paper: - **Global Exponential Convergence**: By extending the Contraction Theory to discrete systems with inequality constraints, the global exponential convergence of the algorithm is proven. - **Linear Parameterization**: Unlike deep learning, the parameterization of each linear function segment is linear, which allows the use of linear regression stability proofs as long as the measured values do not transition from one linear region to an adjacent linear region. - **Lagrangian - Constrained Step - length**: The step - length of the discrete gradient descent is Lagrangian - constrained at the boundaries of two adjacent linear functions, ensuring that this constraint does not reduce the convergence of the unconstrained system dynamics. In addition, the paper also shows how to approximate complex piecewise - linear functions by combining multiple local convex and concave functions, and proposes specific creation and pruning principles to find the correct network topology. In summary, this paper aims to provide a new, more stable learning method by introducing the Min - Max network to overcome the challenges encountered by the traditional gradient descent method when learning piecewise - linear functions.

MinMax Networks

A Neural Network Model For General Minimax Problem

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Mini-max Initialization for Function Approximation.

Smooth Min-Max Monotonic Networks

Maximum Principle Based Algorithms for Deep Learning.

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Gradient Descent Finds Global Minima of Deep Neural Networks.

Statistical Mechanics of Min-Max Problems

MinMaxMin $Q$-learning

Continuous Function Structured in Multilayer Perceptron for Global Optimization

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Maximum-and-Concatenation Networks

On the Banach Spaces Associated with Multi-Layer ReLU Networks: Function Representation, Approximation Theory and Gradient Descent Dynamics

Train simultaneously, generalize better: Stability of gradient-based minimax learners

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

On the saddle point problem for non-convex optimization