Strengthening Layer Interaction via Dynamic Layer Attention

Kaishen Wang,Xun Xia,Jian Liu,Zhang Yi,Tao He
2024-06-19
Abstract:In recent years, employing layer attention to enhance interaction among hierarchical layers has proven to be a significant advancement in building network structures. In this paper, we delve into the distinction between layer attention and the general attention mechanism, noting that existing layer attention methods achieve layer interaction on fixed feature maps in a static manner. These static layer attention methods limit the ability for context feature extraction among layers. To restore the dynamic context representation capability of the attention mechanism, we propose a Dynamic Layer Attention (DLA) architecture. The DLA comprises dual paths, where the forward path utilizes an improved recurrent neural network block, named Dynamic Sharing Unit (DSU), for context feature extraction. The backward path updates features using these shared context representations. Finally, the attention mechanism is applied to these dynamically refreshed feature maps among layers. Experimental results demonstrate the effectiveness of the proposed DLA architecture, outperforming other state-of-the-art methods in image recognition and object detection tasks. Additionally, the DSU block has been evaluated as an efficient plugin in the proposed DLA architecture.The code is available at <a class="link-external link-https" href="https://github.com/tunantu/Dynamic-Layer-Attention" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the static problem existing in the existing Layer Attention Mechanism in deep convolutional neural networks (DCNNs). Specifically, when dealing with feature interactions between different levels, the existing Layer Attention Mechanism adopts a static method of fixed feature maps, which limits the ability of context feature extraction and leads to inefficient information interaction between layers. To solve this problem, the paper proposes the Dynamic Layer Attention (DLA) architecture, aiming to restore the dynamic context representation ability of the attention mechanism, thereby enhancing feature interactions between layers and improving model performance. ### Main Contributions 1. **Proposing the DLA Architecture**: The DLA architecture includes a forward path and a backward path. The forward path uses an improved recurrent neural network block (Dynamic Sharing Unit, DSU) for context feature extraction, and the backward path uses these shared context representations to update the features of each layer. 2. **Designing the DSU Block**: The DSU block is a new RNN block used in the DLA architecture, which can effectively promote the dynamic modification of information between layers and perform well in terms of information integration between layers. 3. **Experimental Verification**: The experimental results show that the DLA architecture is significantly superior to other state - of - the - art methods in image recognition and object detection tasks, especially in image classification tasks on the CIFAR - 10, CIFAR - 100 and ImageNet - 1K datasets, and in object detection tasks on the COCO2017 dataset. ### Dynamic Layer Attention Architecture (DLA) - **Forward Path**: Use DSU blocks to extract context features from each layer. - **Backward Path**: Use the extracted context features to dynamically update the feature maps of each layer. ### Dynamic Sharing Unit (DSU) - **Workflow**: - Input Compression: \[ s_m=\text{ReLU}(W_1[\sigma(c_{m - 1}),y_m]) \] - Hidden Transformation, Input Gate and Forgetting Gate: \[ \begin{cases}\tilde{c}_m = \text{Tanh}(W_c^2\cdot s_m + b_c)\\i_m=\sigma(W_i^2\cdot s_m + b_i)\\f_m=\sigma(W_f^2\cdot s_m + b_f)\end{cases} \] - Update Context Representation: \[ c_m = f_m\odot c_{m - 1}+i_m\odot\tilde{c}_m \] ### Experimental Results - **Image Classification**: - On the CIFAR - 10 and CIFAR - 100 datasets, the Top - 1 accuracy of the DLA - L model is 1.32%, 1.60%, 1.62% and 4.96%, 2.94%, 3.41% higher than that of ResNets respectively. - On the ImageNet - 1K dataset, the Top - 1 accuracy of the DLA - L model is 1.9% and 1.5% higher than that of ResNet - 50 and ResNet - 101 respectively. - **Object Detection**: - On the COCO2017 dataset, using Faster R - CNN and Mask R - CNN as detectors, the average precision (AP) of the DLA - L model is increased by 4.2% and 3.6% respectively. Through these experimental results, the paper verifies the effectiveness of the DLA architecture in enhancing information interaction between layers and improving model performance.