Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Farhad Pourkamali-Anaraki,Tahamina Nasrin,Robert E. Jensen,Amy M. Peterson,Christopher J. Hansen
2024-02-08
Abstract:A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.
Neural and Evolutionary Computing,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of the application effectiveness of adaptive activation functions in neural networks under conditions of sparse experimental data (i.e., small sample sizes). Specifically, the research focuses on the following aspects: 1. **Evaluating the effectiveness of adaptive activation functions**: Compared to traditional fixed-shape activation functions (such as ELU, Softplus, and Swish), whether adaptive activation functions (with trainable parameters) can improve prediction accuracy and model confidence. The study explores different scenarios of sharing activation functions within hidden layers versus assigning independent trainable parameters to each unit. 2. **Effectiveness on small sample datasets**: For the first time, systematically exploring the performance of adaptive activation functions in application scenarios with fewer than 100 training samples. Through three different additive manufacturing problems as case studies, the applicability and superiority of adaptive activation functions in such data-scarce environments are verified. 3. **Quantifying prediction uncertainty**: In addition to relying on traditional classification accuracy metrics, the study introduces the concept of prediction sets, using conformal inference methods to generate prediction intervals, and evaluates the impact of adaptive activation functions on neural network prediction uncertainty through two metrics: empirical coverage and the average size of prediction sets. 4. **Providing code implementation**: To facilitate practitioners in applying adaptive activation functions under limited data conditions, the authors provide source code implementations of hidden layer activation functions with shared and independent trainable parameters. Through this research, the paper aims to fill the current gap in understanding the application effectiveness of adaptive activation functions in small sample data environments and provide new insights and tool support for researchers in the scientific and engineering fields constrained by limited labeled data.