An Empirical Study of Attention Networks for Semantic Segmentation

Hao Guo,Hongbiao Si,Guilin Jiang,Wei Zhang,Zhiyan Liu,Xuanyi Zhu,Xulong Zhang,Yang Liu
2023-09-19
Abstract:Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods.Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks to prove their superiority and ignore their characteristics without considering the computation complexity and precision in various categories, which is essential for engineering applications. Besides, the methods to analyze the FLOPs and memory are not consistent between different networks, which makes the comparison hard to be utilized. What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking. This paper first conducts experiments to analyze their computation complexity and compare their performance. Then it summarizes suitable scenes for these networks and concludes key points that should be concerned when constructing an attention network. Last it points out some future directions of the attention network.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the following problems: 1. **Inconsistent comparison criteria for existing attention networks**: Existing attention networks in semantic segmentation tasks usually prove their superiority only by comparing with the mIoU of the previous state - of - the - art method (SOTA). However, these comparisons ignore different scenarios suitable for different attention networks, especially in terms of computational complexity and precision for each category, which are crucial for engineering applications. 2. **Lack of a unified analysis method for computational complexity and memory usage**: Different attention networks do not follow a unified standard when analyzing FLOPs (floating - point operations) and memory usage, making it difficult to make a fair comparison. For example, some studies do not provide FLOPs analysis or calculate FLOPs under different input sizes, making the results difficult to compare directly. 3. **Insufficient conclusions**: Although there are multiple methods using the attention mechanism in semantic segmentation, the summaries and conclusions of these methods are still not sufficient, especially lacking guidance on how to construct an efficient attention network. To solve these problems, this paper conducts the following work: - **Experimental analysis**: Analyze the computational complexity of different attention networks through experiments and compare their performance. - **Summarize applicable scenarios**: Summarize the application scenarios suitable for different types of attention networks. - **Summarize key points**: Summarize the key points that should be noted when constructing attention networks. - **Future directions**: Point out the future research directions of attention networks. ### Specific problems and solutions 1. **Unified comparison criteria**: The paper proposes a unified method to calculate and compare the FLOPs of different attention networks to ensure fairness and comparability of the comparison. 2. **Detailed analysis of computational complexity and memory usage**: By conducting a detailed analysis of the FLOPs and memory usage of different networks, a more comprehensive performance evaluation is provided. 3. **Summary and induction**: By summarizing the experimental results of multiple typical attention networks, the key points for constructing an efficient attention network are pointed out, and future research directions are proposed. ### Conclusions The main conclusions drawn by the paper are: - **FLANet** is the best choice for most segmentation tasks, achieving the highest accuracy under the premise of high computational efficiency. - **Denoised NL** is suitable for tasks requiring high computational efficiency due to its fastest processing speed. - The paper also suggests enriching the context information in the attention mechanism from four aspects: decoupling of global shared and non - shared attention, the importance of channel and spatial attention, avoiding the attention - missing problem, and eliminating attention noise. These conclusions provide valuable references for future research and point out the direction for improvement.