RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

Xin Zhang,Chen Liu,Degang Yang,Tingting Song,Yichen Ye,Ke Li,Yingze Song
2024-03-28
Abstract:Spatial attention has been widely used to improve the performance of convolutional neural networks. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that the spatial attention mechanism essentially solves the problem of convolutional kernel parameter sharing. However, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we propose a novel attention mechanism called Receptive-Field Attention (RFA). Existing spatial attention, such as Convolutional Block Attention Module (CBAM) and Coordinated Attention (CA) focus only on spatial features, which does not fully address the problem of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, COCO, and VOC datasets to demonstrate the superiority of our approach. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. In this way, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at <a class="link-external link-https" href="https://github.com/Liuchen1997/RFAConv" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the parameter - sharing problem in the convolution operation of existing convolutional neural networks (CNNs), as well as the limitations of existing spatial attention mechanisms when dealing with large - size convolution kernels. Specifically: 1. **The parameter - sharing problem in convolution operations**: Standard convolution operations use convolution kernels with shared parameters to extract feature information. This means that features at different positions are processed with the same parameters, ignoring the differential information brought by different positions, thus limiting the network performance. 2. **Limitations of existing spatial attention mechanisms**: Existing spatial attention mechanisms (such as CBAM and CA) mainly focus on spatial features, but fail to fully solve the parameter - sharing problem of large - size convolution kernels, and cannot effectively emphasize the feature importance within each receptive field. To solve these problems, the author proposes a new attention mechanism - **Receptive - Field Attention (RFA)**, and based on this, designs a new convolution operation - **RFAConv**. RFA not only focuses on the spatial features of the receptive field, but also provides effective attention weights for large - size convolution kernels, thereby solving the parameter - sharing problem and significantly improving the network performance. ### Main contributions 1. **Proposing Receptive - Field Attention (RFA)**: RFA not only focuses on spatial features, but also considers the feature importance within the receptive field, especially in the case of large - size convolution kernels. 2. **Designing RFAConv**: RFAConv is an innovative convolution operation that can significantly improve network performance with almost no increase in computational cost and parameters. 3. **Experimental verification**: Through experiments on datasets such as ImageNet - 1k, COCO, and VOC, the effectiveness and superiority of RFAConv are proved. ### Formula representation In order to understand the working principle of RFAConv more clearly, the following is the Markdown - format representation of relevant formulas: - **Standard convolution operation**: \[ F_i = X_{i1}\times K_1+X_{i2}\times K_2+\cdots+X_{iS}\times K_S \] where \(F_i\) represents the value calculated by the \(i\)-th convolution slider, \(X_{ij}\) represents the \(j\)-th pixel value in the \(i\)-th slider, \(K_j\) represents the convolution kernel parameter, \(S\) represents the number of convolution kernel parameters, and \(N\) represents the total number of receptive - field sliders. - **Spatial attention mechanism**: \[ F_i = X_i\times A_i \] where \(F_i\) represents the weighted value, and \(X_i\) and \(A_i\) represent the values at different positions of the input feature map and the learned attention map respectively. - **Standard convolution operation combined with spatial attention**: \[ F_i=(X_i\times A_i)\times K \] - **Calculation formula of RFAConv**: \[ F = \text{Softmax}(g_{1\times1}(\text{AvgPool}(X)))\times\text{ReLU}(\text{Norm}(g_{k\times k}(X))) \] \[ F = A_{rf}\times F_{rf} \] where \(g_{i\times i}\) represents a grouped convolution of size \(i\times i\), \(k\) represents the convolution kernel size, \(\text{Norm}\) represents normalization, \(X\) represents the input feature map, and \(F\) is the result obtained by multiplying the attention map \(A_{rf}\) with the transformed receptive - field spatial feature \(F_{rf}\). Through these improvements, RFAConv can significantly improve the performance of convolutional neural networks while maintaining a low computational cost.