Abstract:In recent years, knowledge distillation methods based on contrastive learning have achieved promising results on image classification and object detection tasks. However, in this line of research, we note that less attention is paid to semantic segmentation. Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predictions. In order to address this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications. Af-DCD leverages a masked feature mimicking strategy, and formulates a novel contrastive learning loss via taking advantage of tactful feature partitions across both channel and spatial dimensions, allowing to effectively transfer dense and structured local knowledge learnt by the teacher model to a target student model while maintaining training efficiency. Extensive experiments on five mainstream benchmarks with various teacher-student network pairs demonstrate the effectiveness of our approach. For instance, the DeepLabV3-Res18|DeepLabV3-MBV2 model trained by Af-DCD reaches 77.03%|76.38% mIOU on Cityscapes dataset when choosing DeepLabV3-Res101 as the teacher, setting new performance records. Besides that, Af-DCD achieves an absolute mIOU improvement of 3.26%|3.04%|2.75%|2.30%|1.42% compared with individually trained counterpart on Cityscapes|Pascal VOC|Camvid|ADE20K|COCO-Stuff-164K. Code is available at <a class="link-external link-https" href="https://github.com/OSVAI/Af-DCD" rel="external noopener nofollow">this https URL</a>

Disassembling Convolutional Segmentation Network

CNN LEGO: Disassembling and Assembling Convolutional Neural Network

Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation

High-Resolution Remote Sensing Image Semantic Segmentation Method Based on Improved Encoder-Decoder Convolutional Neural Network

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

Discriminative Features Reconstruction Network For Semantic Segmentation

Image Segmentation Using Encoder-Decoder with Deformable Convolutions

SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Adaptive deformable convolutional network

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

A Deep Semantic Segmentation Network with Semantic and Contextual Refinements

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

OverSegNet: A convolutional encoder–decoder network for image over-segmentation

Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation

MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation

Knowledge Adaptation for Efficient Semantic Segmentation

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers