Self-attention Neural Architecture Search for Semantic Image Segmentation
Zhenkun Fan,Guosheng Hu,Xin Sun,Gaige Wang,Junyu Dong,Chi Su
DOI: https://doi.org/10.1016/j.knosys.2021.107968
IF: 8.139
2021-01-01
Knowledge-Based Systems
Abstract:Self-attention can capture long-distance dependencies and is widely used in semantic segmentation. Existing methods mainly use two kinds of self-attentions, i.e., spatial attention and channel attention, which can capture the relations in H W dimension (image plane, height and width) and C dimension (channels), respectively. Very little research investigates self-attention along other dimensions, which can potentially improve the segmentation performance. In this work, we investigate the self-attentions along all the possible dimensions { H , W , C , H W , H C , C W , H W C }. Then we explore the aggregation of all the possible self-attentions. We apply the neural architecture search (NAS) technique to achieve optimal aggregation. Specifically, we carefully design (1) the search space and (2) the optimization method. For (1), we introduce a building block, a basic self-attention search unit (BSU), which can model self-attentions along all the dimensions. And the search space contains within-BSU and cross-BSU operations. In addition, we propose an attention-map splitting method, which can reduce the computations by 1/3. For (2), we apply an efficient differentiable optimization method to search the optimal aggregation. We conduct extensive experiments on Cityscapes and ADE20K datasets. The results show the effectiveness of the proposed method, and we achieve very competitive performance against state-of-the-art methods.