A lightweight siamese transformer for few-shot semantic segmentation

Hegui Zhu,Yange Zhou,Cong Jiang,Lianping Yang,Wuming Jiang,Zhimu Wang
DOI: https://doi.org/10.1007/s00521-024-09471-x
2024-02-29
Neural Computing and Applications
Abstract:Few-shot semantic segmentation (FSS) is a challenging task that aims to segment new classes in query images with a few annotated support samples. One inherent challenge in FSS is the intra-class variation resulting from the limited availability of support samples and the diversity of query data. Current methods frequently employ prototype-support techniques to tackle this issue. However, it is important to note that a single support prototype shares limited commonalities with query features, and increases the difficulty of accurate segmentation. In this paper, we propose a lightweight and effective framework named Siamese Transformer (SiaT) with a mere 0.68M learnable parameters to enhance commonalities between prototypes and query features. The SiaT framework consists of two key modules: the Siamese Transformer Module (STM) and the Query Activation Module (QAM). The STM integrates two shared Transformer decoders for information propagation to generate two enhanced prototypes. One Transformer decoder facilitates the propagation of target-related information from the query to the original support prototype, while the other propagates foreground information from the support to the initial query prototype. The QAM utilizes the two enhanced prototypes from the STM as support information, engaging with query features through channel-wise allocation and concatenation, termed query activation. Moreover, SiaT showcases competitive performance on the widely used benchmarks PASCAL-5i$$5^i$$ and COCO-20i$$20^i$$, which demonstrates its effectiveness in addressing the intra-class variation challenge within FSS tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?