A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks

Pengcheng Jiang,Ferrante Neri,Yu Xue,Ujjwal Maulik
DOI: https://doi.org/10.1142/s0129065724500631
IF: 6.325
2024-09-03
International Journal of Neural Systems
Abstract:International Journal of Neural Systems, Ahead of Print. In many modern machine learning (ML) models, attention mechanisms (AMs) play a crucial role in processing data and identifying significant parts of the inputs, whether these are text or images. This selective focus enables subsequent stages of the model to achieve improved classification performance. Traditionally, AMs are applied as a preprocessing substructure before a neural network, such as in encoder/decoder architectures. In this paper, we extend the application of AMs to intermediate stages of data propagation within ML models. Specifically, we propose a generalized attention mechanism (GAM), which can be integrated before each layer of a neural network for classification tasks. The proposed GAM allows for at each layer/step of the ML architecture identification of the most relevant sections of the intermediate results. Our experimental results demonstrate that incorporating the proposed GAM into various ML models consistently enhances the accuracy of these models. This improvement is achieved with only a marginal increase in the number of parameters, which does not significantly affect the training time.
computer science, artificial intelligence
What problem does this paper attempt to address?