DropQueries: A Simple Way to Discover Comprehensive Segment Representations

Haojie Ding,Bin Wang,Guoliang Kang,Weijia Li,Conghui He,Yao Zhao,Yunchao Wei
DOI: https://doi.org/10.1109/tmm.2023.3311909
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Inspired by the recent progress in object detection ( i.e. , DETR), the set prediction mechanism significantly advances the research of semantic segmentation and achieves state-of-the-art performance on popular segmentation benchmarks. The generic pipeline of such a mechanism often firstly takes learnable query features to predict classes and segment masks separately and then blends these class-aware segment masks into the final segmentation mask. One key factor behind the successful training of this pipeline is to apply the bipartite matching strategy between the set of predictions and ground-truth segments. However, we find that the bipartite matching -based assignment often tends to segment one target class with only a few learnable queries, making many other pre-defined queries useless. In this paper, we propose a simple way, named DropQueries (DQ), to facilitate the set prediction based segmentation architectures. At each iteration of training, our DQ randomly and independently drops each learnable query with a certain probability before bipartite matching . In this way, more queries are encouraged to participate in the segmentation process to discover comprehensive segment representations. We conduct extensive experiments using MaskFormer and Mask2Former as two basic yet powerful segmentation architectures. Without bells and whistles, our DQ strategy can bring consistent improvements over strong baselines on popular semantic segmentation benchmarks, including ADE 20K, Cityscapes, COCO Stuff 10K and VSPW.
What problem does this paper attempt to address?