ErfReLU: Adaptive Activation Function for Deep Neural Network

Ashish Rajanand,Pradeep Singh
2023-06-02
Abstract:Recent research has found that the activation function (AF) selected for adding non-linearity into the output can have a big impact on how effectively deep learning networks perform. Developing activation functions that can adapt simultaneously with learning is a need of time. Researchers recently started developing activation functions that can be trained throughout the learning process, known as trainable, or adaptive activation functions (AAF). Research on AAF that enhance the outcomes is still in its early stages. In this paper, a novel activation function 'ErfReLU' has been developed based on the erf function and ReLU. This function exploits the ReLU and the error function (erf) to its advantage. State of art activation functions like Sigmoid, ReLU, Tanh, and their properties have been briefly explained. Adaptive activation functions like Tanhsoft1, Tanhsoft2, Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and Serf have also been described. Lastly, performance analysis of 9 trainable activation functions along with the proposed one namely Tanhsoft1, Tanhsoft2, Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and Serf has been shown by applying these activation functions in MobileNet, VGG16, and ResNet models on CIFAR-10, MNIST, and FMNIST benchmark datasets.
Neural and Evolutionary Computing,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are some limitations of existing activation functions in deep neural networks, especially how to improve the performance of deep - learning models and overcome the "dead ReLU" problem in traditional activation functions (such as ReLU). Specifically: 1. **Introduction of Non - linearity**: Traditional linear activation functions are unable to handle non - linear relationships in data. Although commonly - used activation functions such as Sigmoid, Tanh and ReLU introduce non - linearity, they perform poorly or have problems in some cases. 2. **Vanishing Gradient Problem**: Many activation functions are prone to cause the vanishing gradient during the back - propagation process, thus affecting the training effect of the model. 3. **Handling of Negative Value Region**: ReLU outputs zero in the negative value region, which may cause some neurons to "die", that is, no longer respond to the input, and this limits the learning ability of the model. 4. **Adaptive Ability**: Most existing activation functions are fixed and cannot adaptively adjust their shapes and parameters according to the changes of data, thus affecting the generalization ability and expressive ability of the model. For this reason, the author proposes a new adaptive activation function ErfReLU, which combines the advantages of the error function (erf) and ReLU, aiming to solve the above problems. ErfReLU can not only maintain the characteristics of ReLU in the positive value region, but also introduce non - linearity in the negative value region through the error function, avoid neuron "death", and at the same time reduce the vanishing gradient problem. In addition, ErfReLU has fewer parameters and can better meet the needs of different data sets and tasks. In summary, the core objective of this paper is to develop a new activation function that can be adaptively adjusted and perform well in various application scenarios, so as to improve the overall performance of deep neural networks.