Abstract:Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods.Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks to prove their superiority and ignore their characteristics without considering the computation complexity and precision in various categories, which is essential for engineering applications. Besides, the methods to analyze the FLOPs and memory are not consistent between different networks, which makes the comparison hard to be utilized. What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking. This paper first conducts experiments to analyze their computation complexity and compare their performance. Then it summarizes suitable scenes for these networks and concludes key points that should be concerned when constructing an attention network. Last it points out some future directions of the attention network.

What problem does this paper attempt to address?

This paper attempts to solve the following problems: 1. **Inconsistent comparison criteria for existing attention networks**: Existing attention networks in semantic segmentation tasks usually prove their superiority only by comparing with the mIoU of the previous state - of - the - art method (SOTA). However, these comparisons ignore different scenarios suitable for different attention networks, especially in terms of computational complexity and precision for each category, which are crucial for engineering applications. 2. **Lack of a unified analysis method for computational complexity and memory usage**: Different attention networks do not follow a unified standard when analyzing FLOPs (floating - point operations) and memory usage, making it difficult to make a fair comparison. For example, some studies do not provide FLOPs analysis or calculate FLOPs under different input sizes, making the results difficult to compare directly. 3. **Insufficient conclusions**: Although there are multiple methods using the attention mechanism in semantic segmentation, the summaries and conclusions of these methods are still not sufficient, especially lacking guidance on how to construct an efficient attention network. To solve these problems, this paper conducts the following work: - **Experimental analysis**: Analyze the computational complexity of different attention networks through experiments and compare their performance. - **Summarize applicable scenarios**: Summarize the application scenarios suitable for different types of attention networks. - **Summarize key points**: Summarize the key points that should be noted when constructing attention networks. - **Future directions**: Point out the future research directions of attention networks. ### Specific problems and solutions 1. **Unified comparison criteria**: The paper proposes a unified method to calculate and compare the FLOPs of different attention networks to ensure fairness and comparability of the comparison. 2. **Detailed analysis of computational complexity and memory usage**: By conducting a detailed analysis of the FLOPs and memory usage of different networks, a more comprehensive performance evaluation is provided. 3. **Summary and induction**: By summarizing the experimental results of multiple typical attention networks, the key points for constructing an efficient attention network are pointed out, and future research directions are proposed. ### Conclusions The main conclusions drawn by the paper are: - **FLANet** is the best choice for most segmentation tasks, achieving the highest accuracy under the premise of high computational efficiency. - **Denoised NL** is suitable for tasks requiring high computational efficiency due to its fastest processing speed. - The paper also suggests enriching the context information in the attention mechanism from four aspects: decoupling of global shared and non - shared attention, the importance of channel and spatial attention, avoiding the attention - missing problem, and eliminating attention noise. These conclusions provide valuable references for future research and point out the direction for improvement.

An Empirical Study of Attention Networks for Semantic Segmentation

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

Semantic Segmentation With Attention Mechanism for Remote Sensing Images

Embedded Attention Network for Semantic Segmentation

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Expectation-Maximization Attention Networks for Semantic Segmentation

Threshold Attention Network for Semantic Segmentation of Remote Sensing Images

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

Point Attention Network for Semantic Segmentation of 3D Point Clouds

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Point Attention Network for Point Cloud Semantic Segmentation.

Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation

Research on Image Semantic Segmentation Based on Hybrid Cascade Feature Fusion and Detailed Attention Mechanism

SACANet: scene-aware class attention network for semantic segmentation of remote sensing images

Semantic boundary enhancement and position attention network with long-range dependency for semantic segmentation

Chemical signalling in the nervous system.

Adaptive multi-scale dual attention network for semantic segmentation

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation