DECNet: Dense embedding contrast for unsupervised semantic segmentation

Xiaoqin Zhang,Baiyu Chen,Xiaolong Zhou,Sixian Chan
DOI: https://doi.org/10.1016/j.neunet.2024.106557
Abstract:Unsupervised semantic segmentation is important for understanding that each pixel belongs to known categories without annotation. Recent studies have demonstrated promising outcomes by employing a vision transformer backbone pre-trained on an image-level dataset in a self-supervised manner. However, those methods always depend on complex architectures or meticulously designed inputs. Naturally, we are attempting to explore the investment with a straightforward approach. To prevent over-complication, we introduce a simple Dense Embedding Contrast network (DECNet) for unsupervised semantic segmentation in this paper. Specifically, we propose a Nearest Neighbor Similarity strategy (NNS) to establish well-defined positive and negative pairs for dense contrastive learning. Meanwhile, we optimize a contrastive objective named Ortho-InfoNCE to alleviate the false negative problem inherent in contrastive learning for further enhancing dense representations. Finally, extensive experiments conducted on COCO-Stuff and Cityscapes datasets demonstrate that our approach outperforms state-of-the-art methods.
What problem does this paper attempt to address?