Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

Xiaoqing Zhang,Zunjie Xiao,Xiao Wu,Yanlin Chen,Jilu Zhao,Yan Hu,Jiang Liu
2024-05-02
Abstract:Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in medical image analysis, the existing spatial attention mechanisms perform poorly in emphasizing subtle lesion areas. Specifically, the paper points out: 1. **Long - distance Dependency Modeling**: The self - attention mechanism usually learns the correlations of pixel positions by capturing the long - distance dependency relationships among all pixel positions, which inevitably introduces redundant position information from other pixel positions. This redundant position information has a relatively small impact on the learning tasks of natural images because the target areas in natural images are more prominent and can be easily captured through long - distance dependencies. However, in medical images, the lesion areas are relatively subtle, and the differences in pixel context information are not obvious, making it difficult to emphasize the lesion area information by modeling long - distance dependencies. 2. **Pixel Context Aggregation**: The existing spatial attention methods mainly use point - by - point convolution (Conv 1×1) or a separate cross - channel pooling (CP) method to aggregate single - scale pixel context information along the channel axis, ignoring the importance of multi - scale pixel context information. Literature research has found that no spatial attention method uses multi - scale pixel context information to improve the representation ability of deep neural networks (DNN). Based on the above analysis, the paper proposes two core questions: 1. Can a method be designed to highlight important pixel positions and suppress unimportant pixel positions without capturing the long - distance dependencies between pixel positions? 2. Can multi - scale pixel context information be incorporated into the spatial attention design to improve performance and interpretability? To answer these questions, the paper proposes a novel and lightweight architectural unit - the Pyramid Pixel Context Adaption (PPCA) module. The PPCA module explicitly incorporates multi - scale pixel context information into the CNN representation through pixel - independent recalibration of multi - scale pixel context information. The PPCA module consists of three components: Cross - Channel Pyramid Pooling, Pixel Normalization, and Pixel Context Adaption. In addition, to further improve the performance of PPCANet in medical image classification tasks, the paper also introduces Supervised Contrastive Learning, which uses label information to enhance the feature representation ability through the method of contrast pairs. Overall, this paper aims to solve the problem that the existing spatial attention mechanisms are difficult to effectively emphasize subtle lesion areas in medical image analysis by designing the PPCA module and introducing Supervised Contrastive Learning.