HAT: A Visual Transformer Model for Image Recognition Based on Hierarchical Attention Transformation

Xuanyu Zhao,Tao Hu,Chunxia Mao,Ye Yuan,Jun Li
DOI: https://doi.org/10.1109/access.2023.3314573
IF: 3.9
2023-09-22
IEEE Access
Abstract:In the field of image recognition, Visual Transformer (ViT) has excellent performance. However, ViT, relies on a fixed self-attentive layer, tends to lead to computational redundancy and makes it difficult to maintain the integrity of the image convolutional feature sequence during the training process. Therefore, we proposed a non-normalization hierarchical attention transfer network (HAT), which introduces threshold attention mechanism and multi head attention mechanism after pooling in each layer. The focus of HAT is shifted between local and global, thus flexibly controlling the attention range of image classification. The HAT used the smaller computational complexity to improve it's scalability, which enables it to handle longer feature sequences and balance efficiency and accuracy. HAT removes layer normalization to increase the likelihood of convergence to an optimal level during training. In order to verify the effectiveness of the proposed model, we conducted experiments on image classification and segmentation tasks. The results shows that compared with classical pyramid structured networks and different attention networks, HAT outperformed the benchmark networks on both ImageNet and CIFAR100 datasets.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?