Multi-query and Multi-Level Enhanced Network for Semantic Segmentation

Bin Xie,Jiale Cao,Rao Muhammad Anwer,Jin Xie,Jing Nie,Aiping Yang,Yanwei Pang
DOI: https://doi.org/10.1016/j.patcog.2024.110777
IF: 8
2024-01-01
Pattern Recognition
Abstract:Plain transformer-based methods have achieved promising performance on semantic segmentation recently. These methods adopt a single set of class queries to predict masks of different semantic categories based on multi-level feature maps. We argue that this single-query design cannot fully exploit diverse information of different levels for improved semantic segmentation. To address this issue, we propose a multi-query and multi-level enhanced network for semantic segmentation (named QLSeg). Our QLSeg first performs multi-level feature enhancement on plain transformer to improve feature discriminability. Afterwards, we introduce multi-query decoder to respectively extract feature embeddings and predict mask logits at different levels, where feature embeddings are adaptively merged for classification and mask logits are summed for output masks. In addition, we introduce masked attention-to-mask to focus on local regions with the same class. We perform the experiments on three widely-used semantic segmentation datasets: ADE20K, COCO-Stuff-10K, and PASCAL-Context. Our proposed QLSeg achieves competitive results on all these three datasets.
What problem does this paper attempt to address?