What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the limitations of nonlinear activation functions in traditional neural networks. Specifically, in the training process of existing deep neural networks (DNN), only the weights and bias parameters of linear transformations are usually optimized, while the nonlinear activation functions are pre - specified and fixed. This method of fixed activation functions has some problems: 1. **Difficulty in choosing an appropriate activation function**: For a specific application, it is very difficult to determine the optimal activation function in advance. 2. **Performance bottleneck**: Although existing activation functions (such as ReLU and its variants) are effective, there are still performance bottlenecks in some cases, such as the "dying ReLU" problem and the vanishing gradient problem. To solve these problems, the paper proposes a systematic method to construct matrix - valued activation functions. The elements of these activation functions are generalized from ReLU and depend on trainable parameters. In this way, the activation functions can be adaptively adjusted to better adapt to the data and task requirements. ### Main contributions 1. **Introduction of trainable matrix - valued activation functions (TMAF)**: - The activation functions are based on matrix - vector multiplication and only use scalar multiplication and comparison operations. - The proposed activation functions depend on trainable parameters, which are trained together with the weight and bias vectors. - This method makes the neural network simpler, more efficient, and shows stronger robustness in numerical experiments. 2. **Extension of the form of activation functions**: - Expand from diagonal matrix activation functions to more general tridiagonal matrix activation functions, and even theoretically can be extended to full - matrix activation functions. - By adjusting the diagonal and non - diagonal elements, nonlinear mixing can be carried out in the channel dimension, thereby improving the expressive ability of the model. 3. **Verification of the effectiveness of the method**: - Experimental verification has been carried out in tasks such as function approximation and image classification, including the MNIST and CIFAR - 10 datasets. - The experimental results show that TMAF is superior to the traditional ReLU activation function in multiple tasks, especially when dealing with high - frequency oscillation functions. ### Mathematical formulas To describe the specific form of TMAF, the following formulas are defined in the paper: - The form of the diagonal matrix activation function \( D_\ell \) is: \[ D_\ell(y)=\text{diag}(\alpha_{\ell,1}(y_1),\alpha_{\ell,2}(y_2),\ldots,\alpha_{\ell,n_\ell}(y_{n_\ell})),\quad y\in\mathbb{R}^{n_\ell} \] where \( \alpha_{\ell,i}(s) \) is a piecewise constant function, and the specific form is as follows: \[ \alpha_{\ell,i}(s)= \begin{cases} t_{\ell,i,0},&s\in(-\infty,s_{\ell,i,1}]\\ t_{\ell,i,1},&s\in(s_{\ell,i,1},s_{\ell,i,2}]\\ \vdots\\ t_{\ell,i,m_{\ell,i}-1},&s\in(s_{\ell,i,m_{\ell,i}-1},s_{\ell,i,m_{\ell,i}}]\\ t_{\ell,i,m_{\ell,i}},&s\in(s_{\ell,i,m_{\ell,i}},\infty) \end{cases} \] Through these improvements, the paper shows the superior performance of TMAF in various tasks, especially when dealing with complex and high - frequency signals.

Neural networks with trainable matrix activation functions

A novel type of activation function in artificial neural networks: Trained activation function

Normalized Activation Function: Toward Better Convergence

Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling

A novel activation function for multilayer feed-forward neural networks

Activation Functions: Dive into an optimal activation function

An overview of the activation functions used in deep learning algorithms

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Trainable Activation Function in Image Classification

Activation Functions in Artificial Neural Networks: A Systematic Overview

Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks

A survey on modern trainable activation functions

Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

A simple and efficient architecture for trainable activation functions

Learning Combinations of Activation Functions

Bayesian optimization for sparse neural networks with trainable activation functions

Activation function optimization method: Learnable series linear units (LSLUs)

Activation Adaptation in Neural Networks

On the Universally Optimal Activation Function for a Class of Residual Neural Networks

Importance of Optimizing Neural Activation Function Types

Trainable back-propagated functional transfer matrices