Cross-and-Diagonal Networks: An Indirect Self-Attention Mechanism for Image Classification

Jiahang Lyu,Rongxin Zou,Qin Wan,Wang Xi,Qinglin Yang,Sarath Kodagoda,Shifeng Wang
DOI: https://doi.org/10.3390/s24072055
IF: 3.9
2024-03-24
Sensors
Abstract:In recent years, computer vision has witnessed remarkable advancements in image classification, specifically in the domains of fully convolutional neural networks (FCNs) and self-attention mechanisms. Nevertheless, both approaches exhibit certain limitations. FCNs tend to prioritize local information, potentially overlooking crucial global contexts, whereas self-attention mechanisms are computationally intensive despite their adaptability. In order to surmount these challenges, this paper proposes cross-and-diagonal networks (CDNet), innovative network architecture that adeptly captures global information in images while preserving local details in a more computationally efficient manner. CDNet achieves this by establishing long-range relationships between pixels within an image, enabling the indirect acquisition of contextual information. This inventive indirect self-attention mechanism significantly enhances the network's capacity. In CDNet, a new attention mechanism named "cross and diagonal attention" is proposed. This mechanism adopts an indirect approach by integrating two distinct components, cross attention and diagonal attention. By computing attention in different directions, specifically vertical and diagonal, CDNet effectively establishes remote dependencies among pixels, resulting in improved performance in image classification tasks. Experimental results highlight several advantages of CDNet. Firstly, it introduces an indirect self-attention mechanism that can be effortlessly integrated as a module into any convolutional neural network (CNN). Additionally, the computational cost of the self-attention mechanism has been effectively reduced, resulting in improved overall computational efficiency. Lastly, CDNet attains state-of-the-art performance on three benchmark datasets for similar types of image classification networks. In essence, CDNet addresses the constraints of conventional approaches and provides an efficient and effective solution for capturing global context in image classification tasks.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
### Main Problems Addressed by the Paper This paper primarily addresses two key issues in image classification tasks: 1. **Limitations of Fully Convolutional Networks (FCN)**: FCNs tend to prioritize local information, potentially overlooking important global contextual information. 2. **Computational Complexity of Self-Attention Mechanisms**: Although self-attention mechanisms are highly adaptable, their computational cost is high. To overcome these challenges, the paper proposes a new architecture called "Cross-and-Diagonal Networks" (CDNet). CDNet efficiently captures global information in images through an indirect self-attention mechanism while preserving local details. This mechanism can establish long-distance relationships between pixels in an image, thereby indirectly acquiring contextual information and significantly enhancing the network's capabilities. ### Main Contributions 1. **Innovative Attention Mechanism**: CDNet introduces a new "cross-and-diagonal attention" mechanism. It calculates attention by integrating two different components in the vertical and diagonal directions, effectively establishing long-range dependencies between pixels and improving performance in image classification tasks. 2. **Improved Computational Efficiency**: Compared to traditional non-local blocks, CDNet significantly simplifies the computational complexity of self-attention operations, reducing it from \(O(H \times W)^2\) to \(O((H + W - 1))^2\), where \(H\) and \(W\) are the height and width of the image, respectively. This makes CDNet more GPU-friendly and enhances overall computational efficiency. 3. **Modular Design**: CDNet can be easily integrated as a plug-and-play module into any convolutional neural network without requiring major modifications to the existing network structure. ### Experimental Results The experimental section demonstrates the performance of CDNet compared to other advanced methods on three benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet). The results show that CDNet not only achieves state-of-the-art levels in classification accuracy but also reduces computational complexity. Specifically, on the CIFAR-100 dataset, CDNet improves by 1.67 percentage points compared to the previous best method. ### Conclusion In summary, CDNet addresses the limitations of traditional FCNs and self-attention mechanisms by introducing an indirect self-attention mechanism while maintaining efficient computational performance, showcasing superior performance in image classification tasks.