Abstract:Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in medical image analysis, the existing spatial attention mechanisms perform poorly in emphasizing subtle lesion areas. Specifically, the paper points out: 1. **Long - distance Dependency Modeling**: The self - attention mechanism usually learns the correlations of pixel positions by capturing the long - distance dependency relationships among all pixel positions, which inevitably introduces redundant position information from other pixel positions. This redundant position information has a relatively small impact on the learning tasks of natural images because the target areas in natural images are more prominent and can be easily captured through long - distance dependencies. However, in medical images, the lesion areas are relatively subtle, and the differences in pixel context information are not obvious, making it difficult to emphasize the lesion area information by modeling long - distance dependencies. 2. **Pixel Context Aggregation**: The existing spatial attention methods mainly use point - by - point convolution (Conv 1×1) or a separate cross - channel pooling (CP) method to aggregate single - scale pixel context information along the channel axis, ignoring the importance of multi - scale pixel context information. Literature research has found that no spatial attention method uses multi - scale pixel context information to improve the representation ability of deep neural networks (DNN). Based on the above analysis, the paper proposes two core questions: 1. Can a method be designed to highlight important pixel positions and suppress unimportant pixel positions without capturing the long - distance dependencies between pixel positions? 2. Can multi - scale pixel context information be incorporated into the spatial attention design to improve performance and interpretability? To answer these questions, the paper proposes a novel and lightweight architectural unit - the Pyramid Pixel Context Adaption (PPCA) module. The PPCA module explicitly incorporates multi - scale pixel context information into the CNN representation through pixel - independent recalibration of multi - scale pixel context information. The PPCA module consists of three components: Cross - Channel Pyramid Pooling, Pixel Normalization, and Pixel Context Adaption. In addition, to further improve the performance of PPCANet in medical image classification tasks, the paper also introduces Supervised Contrastive Learning, which uses label information to enhance the feature representation ability through the method of contrast pairs. Overall, this paper aims to solve the problem that the existing spatial attention mechanisms are difficult to effectively emphasize subtle lesion areas in medical image analysis by designing the PPCA module and introducing Supervised Contrastive Learning.

Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

Channel prior convolutional attention for medical image segmentation

PCCA-Model: an attention module for medical image segmentation

Multiscale Feature Attention Module Based Pyramid Network for Medical Digital Radiography Image Enhancement

Fuzzy-based Cross-Image Pixel Contrastive Learning for Compact Medical Image Segmentation

PCANet: Pyramid Context-aware Network for Retinal Vessel Segmentation.

DCACNet: Dual context aggregation and attention-guided cross deconvolution network for medical image segmentation

PAMSNet: A medical image segmentation network based on spatial pyramid and attention mechanism

PiCANet: Learning Pixel-wise Contextual Attention in ConvNets and Its Application in Saliency Detection.

Medical Image Classification Using Light-weight CNN with Spiking Cortical Model Based Attention Module

PiCANet: Pixel-wise Contextual Attention Learning for Accurate Saliency Detection

Channel-Specific and Spatial Residual Attention Network for Medical Image Denoising

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

PCSA: Enhancing CNN Performance With Pyramid Channel and Spatial Attention

MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

Boundary-aware context neural network for medical image segmentation

PAC-Net: Multi-pathway FPN with position attention guided connections and vertex distance IoU for 3D medical image detection

Pyramid-attentive GAN for Multimodal Brain Image Complementation in Alzheimer’s Disease Classification

J-CaPA : Joint Channel and Pyramid Attention Improves Medical Image Segmentation

CAENet: Contrast adaptively enhanced network for medical image segmentation based on a differentiable pooling function

Deeply Supervised Layer Selective Attention Network: Towards Label-Efficient Learning for Medical Image Classification