MinMax Networks

Winfried Lohmiller,Philipp Gassert,Jean-Jacques Slotine
2023-06-16
Abstract:While much progress has been achieved over the last decades in neuro-inspired machine learning, there are still fundamental theoretical problems in gradient-based learning using combinations of neurons. These problems, such as saddle points and suboptimal plateaus of the cost function, can lead in theory and practice to failures of learning. In addition, the discrete step size selection of the gradient is problematic since too large steps can lead to instability and too small steps slow down the learning. This paper describes an alternative discrete MinMax learning approach for continuous piece-wise linear functions. Global exponential convergence of the algorithm is established using Contraction Theory with Inequality Constraints, which is extended from the continuous to the discrete case in this paper: The parametrization of each linear function piece is, in contrast to deep learning, linear in the proposed MinMax network. This allows a linear regression stability proof as long as measurements do not transit from one linear region to its neighbouring linear region. The step size of the discrete gradient descent is Lagrangian limited orthogonal to the edge of two neighbouring linear functions. It will be shown that this Lagrangian step limitation does not decrease the convergence of the unconstrained system dynamics in contrast to a step size limitation in the direction of the gradient. We show that the convergence rate of a constrained piece-wise linear function learning is equivalent to the exponential convergence rates of the individual local linear regions.
Machine Learning,Dynamical Systems
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several theoretical and practical difficulties encountered by the gradient descent method in neural network learning. Specifically, these problems include: 1. **Saddle Points and Sub - optimal Plateaus**: When using gradient descent for learning, it may get trapped in saddle points or sub - optimal plateaus, leading to learning failure. 2. **Discrete Step - length Selection Problem**: Improper selection of the discrete step - length in gradient descent may lead to instability (too large a step - length) or an overly slow learning speed (too small a step - length). To solve these problems, the author proposes a new discrete Min - Max learning method, specifically for continuous piecewise - linear functions. The following are the main contributions of the paper: - **Global Exponential Convergence**: By extending the Contraction Theory to discrete systems with inequality constraints, the global exponential convergence of the algorithm is proven. - **Linear Parameterization**: Unlike deep learning, the parameterization of each linear function segment is linear, which allows the use of linear regression stability proofs as long as the measured values do not transition from one linear region to an adjacent linear region. - **Lagrangian - Constrained Step - length**: The step - length of the discrete gradient descent is Lagrangian - constrained at the boundaries of two adjacent linear functions, ensuring that this constraint does not reduce the convergence of the unconstrained system dynamics. In addition, the paper also shows how to approximate complex piecewise - linear functions by combining multiple local convex and concave functions, and proposes specific creation and pruning principles to find the correct network topology. In summary, this paper aims to provide a new, more stable learning method by introducing the Min - Max network to overcome the challenges encountered by the traditional gradient descent method when learning piecewise - linear functions.