Abstract:In the past few years, channel-wise and spatial-wise attention blocks have been widely adopted as supplementary modules in deep neural networks, enhancing network representational abilities while introducing low complexity. Most attention modules follow a squeeze-and-excitation paradigm. However, to design such attention modules, requires a substantial amount of experiments and computational resources. Neural Architecture Search (NAS), meanwhile, is able to automate the design of neural networks and spares the numerous experiments required for an optimal architecture. This motivates us to design a search architecture that can automatically find near-optimal attention modules through NAS. We propose SASE, a Searching Architecture for Squeeze and Excitation operations, to form a plug-and-play attention block by searching within certain search space. The search space is separated into 4 different sets, each corresponds to the squeeze or excitation operation along the channel or spatial dimension. Additionally, the search sets include not only existing attention blocks but also other operations that have not been utilized in attention mechanisms before. To the best of our knowledge, SASE is the first attempt to subdivide the attention search space and search for architectures beyond currently known attention modules. The searched attention module is tested with extensive experiments across a range of visual tasks. Experimental results indicate that visual backbone networks (ResNet-50/101) using the SASE attention module achieved the best performance compared to those using the current state-of-the-art attention modules. Codes are included in the supplementary material, and they will be made public later.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to automatically design an efficient attention module through Neural Architecture Search (NAS), thereby reducing the time and computational resources required for manual design and experimentation. Specifically, the authors proposed SASE (a search architecture for squeeze - and - excitation operations) to automatically find near - optimal attention modules within a specific search space.
### Background and Problem Description of the Paper
In recent years, attention modules in the channel dimension and the spatial dimension have been widely used in deep neural networks. These modules enhance the representational ability of the network while introducing relatively low complexity. Most attention modules follow the squeeze - and - excitation (SE) paradigm, but designing these modules usually requires a large amount of experimentation and computational resources. At the same time, Neural Architecture Search (NAS) can automate the design of neural networks, thus saving a great deal of human and material resources. Therefore, the authors hope to use NAS to automate the design of attention modules and avoid the cumbersome manual parameter - tuning process.
### Main Contributions of SASE
1. **Fine - grained Search Space**: Divide the search space into four different operation sets, namely squeezing in the channel dimension, excitation in the channel dimension, squeezing in the spatial dimension, and excitation in the spatial dimension. Each operation set includes not only the operations in existing attention mechanisms but also some new operations that have not been used in attention mechanisms.
2. **Customized DAG Structure**: Construct a directed acyclic graph (DAG), where each edge corresponds to an operation set, and apply second - order DARTS for efficient search.
3. **Extensive Experimental Verification**: Integrate the searched attention modules into ResNet - 50 and ResNet - 101 backbone networks and evaluate them on multiple visual benchmark tasks. The results show that their performance is better than that of the current state - of - the - art attention modules.
### Specific Contents of the Search Space
The search space of SASE includes four operation sets:
- **Squeezing Operation Set in the Channel Dimension**: It includes global average pooling (GAP), global second - order pooling (GSoP), normalization and L2 normalization, L4 pooling, combination of GAP and GMP, combination of GAP and standard deviation pooling, combination of GAP and skewness pooling.
- **Excitation Operation Set in the Channel Dimension**: It includes fully - connected layers with dimension reduction, one - dimensional convolutions with different kernel sizes, stacked one - dimensional convolutions, and affine transformations.
- **Squeezing Operation Set in the Spatial Dimension**: Similar to the squeezing operation in the channel dimension, but with different operation dimensions.
- **Excitation Operation Set in the Spatial Dimension**: It mainly includes two - dimensional convolutions with different kernel sizes, stacked two - dimensional convolutions, spatially separable convolutions, and affine transformations.
### Search Process
SASE uses the DARTS framework for search. By assigning learnable weights to each operation and gradually optimizing these weights during the training process, the optimal operation is finally selected. The specific optimization process follows the setting of a bi - level optimization problem and is achieved by alternately updating network weights and architecture parameters for efficient search.
### Experimental Results
The experimental results show that SASE has achieved excellent performance in COCO object detection and instance segmentation tasks as well as ImageNet classification tasks. In particular, on ResNet - 50 and ResNet - 101 backbone networks, SASE significantly improves the performance of the model, surpassing other existing attention modules.
In conclusion, SASE not only reduces the workload of manual design by automatically designing attention modules but also improves the performance of the model, demonstrating its great potential in computer vision tasks.