Consensus Function from an $L_p^q-$norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks

Juan Heredia-Juesas,José Á. Martínez-Lorenzo

DOI: https://doi.org/10.48550/arXiv.2206.15017

2022-06-30

Abstract:The design of a neural network is usually carried out by defining the number of layers, the number of neurons per layer, their connections or synapses, and the activation function that they will execute. The training process tries to optimize the weights assigned to those connections, together with the biases of the neurons, to better fit the training data. However, the definition of the activation functions is, in general, determined in the design process and not modified during the training, meaning that their behavior is unrelated to the training data set. In this paper we propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process. This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks. Furthermore, it simplifies the architectural design since the same activation function definition can be employed in each neuron, letting the training process to optimize their parameters and, thus, their behavior. Our proposed activation function comes from the definition of the consensus variable from the optimization of a linear underdetermined problem with an $L_p^q$ regularization term, via the Alternating Direction Method of Multipliers (ADMM). We define the neural networks using this type of activation functions as $pq-$networks. Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples, compared to equivalent regular feedforward neural networks with fixed activation functions.

Neural and Evolutionary Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the fixedness of activation functions in neural networks. Traditionally, the design process of neural networks determines the activation functions of each layer, and these activation functions do not change during the training process, which means their behavior is independent of the training data set. This limits the flexibility and adaptability of neural networks. To solve this problem, the author proposes an implicit, parameterized non - linear activation function, which can adaptively adjust its shape during the training process. This not only increases the space of network optimization parameters, but also improves the flexibility of neural networks and simplifies the network architecture design, because each neuron can use the same activation function definition, allowing the training process to optimize its parameters and thus optimize their behavior. Specifically, the paper defines consensus variables from the optimization of linearly under - determined problems, and uses the Alternating Direction Method of Multipliers (ADMM) and the $L_q^p$ - norm regularization term to define this new activation function. This activation function is called the activation function in pq - networks, where $p$ and $q$ are parameters that control the shape of the activation function. Preliminary results show that the error of neural networks using this adaptive activation function in regression and classification tasks is lower than that of traditional feed - forward neural networks using fixed activation functions.

Consensus Function from an $L_p^q-$norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks

Normalized Activation Function: Toward Better Convergence

Adaptive quadratures for nonlinear approximation of low-dimensional PDEs using smooth neural networks

Regularized Flexible Activation Function Combinations for Deep Neural Networks

Simple yet effective adaptive activation functions for physics-informed neural networks

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

Adaptive Parametric Activation

Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks

A Convergent ADMM Framework for Efficient Neural Network Training

PDE-constrained Models with Neural Network Terms: Optimization and Global Convergence

Neural network with optimal neuron activation functions based on additive Gaussian process regression

Learning Specialized Activation Functions for Physics-informed Neural Networks

ANAct: Adaptive Normalization for Activation Functions

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Bayesian optimization for sparse neural networks with trainable activation functions

The Implicit Regularization for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Neural Networks with Activation Networks

Unified Control Liapunov Function Based Design of Neural Networks That Aim at Global Minimization of Nonconvex Functions

Activation functions enabling the addition of neurons and layers without altering outcomes

ENN: A Neural Network with DCT Adaptive Activation Functions