Abstract:In recent years, computer vision has witnessed remarkable advancements in image classification, specifically in the domains of fully convolutional neural networks (FCNs) and self-attention mechanisms. Nevertheless, both approaches exhibit certain limitations. FCNs tend to prioritize local information, potentially overlooking crucial global contexts, whereas self-attention mechanisms are computationally intensive despite their adaptability. In order to surmount these challenges, this paper proposes cross-and-diagonal networks (CDNet), innovative network architecture that adeptly captures global information in images while preserving local details in a more computationally efficient manner. CDNet achieves this by establishing long-range relationships between pixels within an image, enabling the indirect acquisition of contextual information. This inventive indirect self-attention mechanism significantly enhances the network's capacity. In CDNet, a new attention mechanism named "cross and diagonal attention" is proposed. This mechanism adopts an indirect approach by integrating two distinct components, cross attention and diagonal attention. By computing attention in different directions, specifically vertical and diagonal, CDNet effectively establishes remote dependencies among pixels, resulting in improved performance in image classification tasks. Experimental results highlight several advantages of CDNet. Firstly, it introduces an indirect self-attention mechanism that can be effortlessly integrated as a module into any convolutional neural network (CNN). Additionally, the computational cost of the self-attention mechanism has been effectively reduced, resulting in improved overall computational efficiency. Lastly, CDNet attains state-of-the-art performance on three benchmark datasets for similar types of image classification networks. In essence, CDNet addresses the constraints of conventional approaches and provides an efficient and effective solution for capturing global context in image classification tasks.

What problem does this paper attempt to address?

### Main Problems Addressed by the Paper This paper primarily addresses two key issues in image classification tasks: 1. **Limitations of Fully Convolutional Networks (FCN)**: FCNs tend to prioritize local information, potentially overlooking important global contextual information. 2. **Computational Complexity of Self-Attention Mechanisms**: Although self-attention mechanisms are highly adaptable, their computational cost is high. To overcome these challenges, the paper proposes a new architecture called "Cross-and-Diagonal Networks" (CDNet). CDNet efficiently captures global information in images through an indirect self-attention mechanism while preserving local details. This mechanism can establish long-distance relationships between pixels in an image, thereby indirectly acquiring contextual information and significantly enhancing the network's capabilities. ### Main Contributions 1. **Innovative Attention Mechanism**: CDNet introduces a new "cross-and-diagonal attention" mechanism. It calculates attention by integrating two different components in the vertical and diagonal directions, effectively establishing long-range dependencies between pixels and improving performance in image classification tasks. 2. **Improved Computational Efficiency**: Compared to traditional non-local blocks, CDNet significantly simplifies the computational complexity of self-attention operations, reducing it from \(O(H \times W)^2\) to \(O((H + W - 1))^2\), where \(H\) and \(W\) are the height and width of the image, respectively. This makes CDNet more GPU-friendly and enhances overall computational efficiency. 3. **Modular Design**: CDNet can be easily integrated as a plug-and-play module into any convolutional neural network without requiring major modifications to the existing network structure. ### Experimental Results The experimental section demonstrates the performance of CDNet compared to other advanced methods on three benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet). The results show that CDNet not only achieves state-of-the-art levels in classification accuracy but also reduces computational complexity. Specifically, on the CIFAR-100 dataset, CDNet improves by 1.67 percentage points compared to the previous best method. ### Conclusion In summary, CDNet addresses the limitations of traditional FCNs and self-attention mechanisms by introducing an indirect self-attention mechanism while maintaining efficient computational performance, showcasing superior performance in image classification tasks.

Cross-and-Diagonal Networks: An Indirect Self-Attention Mechanism for Image Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

A Hyperspectral Image Classification Method Based on the Nonlocal Attention Mechanism of a Multiscale Convolutional Neural Network.

Cross-domain attention network for hyperspectral image classification

HAM: Hybrid Attention Module in Deep Convolutional Neural Networks for Image Classification

Cross-Domain Hyperspectral Image Classification Based on Graph Convolutional Networks

Attending Category Disentangled Global Context for Image Classification

Attention Graph: Learning Effective Visual Features for Large-Scale Image Classification

Towards Deep and Efficient: A Deep Siamese Self-Attention Fully Efficient Convolutional Network for Change Detection in VHR Images

DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects

Class attention network for image recognition

Dense Attention Convolutional Network for Image Classification

An Image Classification Method Based on Adaptive Attention Mechanism and Feature Extraction Network

Fine-Grained Image Classification Based on Cross-Attention Network

Attention Mechanism Meets with Hybrid Dense Network for Hyperspectral Image Classification

A GNN Architecture with Local and Global-Attention Feature for Image Classification

Attention-Mechanism-Containing Neural Networks for High-Resolution Remote Sensing Image Classification

Remote Sensing Image Change Detection Based on Attention and Convolutional Neural Network

Hyperspectral Image Classification Based on 3D Coordination Attention Mechanism Network

Channel-Attention-Based DenseNet Network for Remote Sensing Image Scene Classification

Remote Sensing Image Classification Based on a Cross-Attention Mechanism and Graph Convolution