Abstract:In recent years, employing layer attention to enhance interaction among hierarchical layers has proven to be a significant advancement in building network structures. In this paper, we delve into the distinction between layer attention and the general attention mechanism, noting that existing layer attention methods achieve layer interaction on fixed feature maps in a static manner. These static layer attention methods limit the ability for context feature extraction among layers. To restore the dynamic context representation capability of the attention mechanism, we propose a Dynamic Layer Attention (DLA) architecture. The DLA comprises dual paths, where the forward path utilizes an improved recurrent neural network block, named Dynamic Sharing Unit (DSU), for context feature extraction. The backward path updates features using these shared context representations. Finally, the attention mechanism is applied to these dynamically refreshed feature maps among layers. Experimental results demonstrate the effectiveness of the proposed DLA architecture, outperforming other state-of-the-art methods in image recognition and object detection tasks. Additionally, the DSU block has been evaluated as an efficient plugin in the proposed DLA architecture.The code is available at <a class="link-external link-https" href="https://github.com/tunantu/Dynamic-Layer-Attention" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the static problem existing in the existing Layer Attention Mechanism in deep convolutional neural networks (DCNNs). Specifically, when dealing with feature interactions between different levels, the existing Layer Attention Mechanism adopts a static method of fixed feature maps, which limits the ability of context feature extraction and leads to inefficient information interaction between layers. To solve this problem, the paper proposes the Dynamic Layer Attention (DLA) architecture, aiming to restore the dynamic context representation ability of the attention mechanism, thereby enhancing feature interactions between layers and improving model performance. ### Main Contributions 1. **Proposing the DLA Architecture**: The DLA architecture includes a forward path and a backward path. The forward path uses an improved recurrent neural network block (Dynamic Sharing Unit, DSU) for context feature extraction, and the backward path uses these shared context representations to update the features of each layer. 2. **Designing the DSU Block**: The DSU block is a new RNN block used in the DLA architecture, which can effectively promote the dynamic modification of information between layers and perform well in terms of information integration between layers. 3. **Experimental Verification**: The experimental results show that the DLA architecture is significantly superior to other state - of - the - art methods in image recognition and object detection tasks, especially in image classification tasks on the CIFAR - 10, CIFAR - 100 and ImageNet - 1K datasets, and in object detection tasks on the COCO2017 dataset. ### Dynamic Layer Attention Architecture (DLA) - **Forward Path**: Use DSU blocks to extract context features from each layer. - **Backward Path**: Use the extracted context features to dynamically update the feature maps of each layer. ### Dynamic Sharing Unit (DSU) - **Workflow**: - Input Compression: \[ s_m=\text{ReLU}(W_1[\sigma(c_{m - 1}),y_m]) \] - Hidden Transformation, Input Gate and Forgetting Gate: \[ \begin{cases}\tilde{c}_m = \text{Tanh}(W_c^2\cdot s_m + b_c)\\i_m=\sigma(W_i^2\cdot s_m + b_i)\\f_m=\sigma(W_f^2\cdot s_m + b_f)\end{cases} \] - Update Context Representation: \[ c_m = f_m\odot c_{m - 1}+i_m\odot\tilde{c}_m \] ### Experimental Results - **Image Classification**: - On the CIFAR - 10 and CIFAR - 100 datasets, the Top - 1 accuracy of the DLA - L model is 1.32%, 1.60%, 1.62% and 4.96%, 2.94%, 3.41% higher than that of ResNets respectively. - On the ImageNet - 1K dataset, the Top - 1 accuracy of the DLA - L model is 1.9% and 1.5% higher than that of ResNet - 50 and ResNet - 101 respectively. - **Object Detection**: - On the COCO2017 dataset, using Faster R - CNN and Mask R - CNN as detectors, the average precision (AP) of the DLA - L model is increased by 4.2% and 3.6% respectively. Through these experimental results, the paper verifies the effectiveness of the DLA architecture in enhancing information interaction between layers and improving model performance.

Strengthening Layer Interaction via Dynamic Layer Attention

Attention-based Cross-Layer Domain Alignment for Unsupervised Domain Adaptation

DIANet: Dense-and-Implicit Attention Network

Dynamic Scene Deblurring with Continuous Cross-Layer Attention Transmission

Lane Mark Detection with Pre-Aligned Spatial-Temporal Attention

DSAP: Dynamic Sparse Attention Perception Matcher for Accurate Local Feature Matching

Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement

ELA: Efficient Local Attention for Deep Convolutional Neural Networks

A layer-stress learning framework universally augments deep neural network tasks

Dynamic Scene Deblurring Based on Continuous Cross-Layer Attention Transmission

A Generic Shared Attention Mechanism for Various Backbone Neural Networks

Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks

Human Action Recognition Using Two-Stream Attention Based LSTM Networks

Learning Deep Local Features with Multiple Dynamic Attentions for Large-Scale Image Retrieval.

S2AC: Self-Supervised Attention Correlation Alignment Based on Mahalanobis Distance for Image Recognition

Dynamic feature distillation and pyramid split large kernel attention network for lightweight image super-resolution

A deep neural network approach with attention mechanism to improve the quality of target observation for UAVs

Object detection based on an adaptive attention mechanism

LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

Densely Connected Attention Flow for Visual Question Answering.