DSPformer: Discovering Semantic Parts with Token Growth and Clustering for Zero-Shot Learning

Peng Zhao,Qiangchang Wang,Yilong Yin
DOI: https://doi.org/10.1007/s13735-024-00336-6
2024-01-01
International Journal of Multimedia Information Retrieval
Abstract:Transformers have achieved success in many computer vision tasks, but their potential in Zero-Shot Learning (ZSL) has yet to be fully explored. In this paper, a Transformer architecture is developed, termed DSPformer, which can discover semantic parts by token growth and clustering. This is achieved through two proposed methods: Adaptive Token Growth and Semantic Part Clustering. Firstly, it is observed that the background may distract models, causing the model to rely on irrelevant regions to make decisions. To alleviate this issue, the ATG is proposed to locate discriminative foreground regions and remove meaningless and even noisy backgrounds. Secondly, semantically similar parts may be distributed into different tokens. To address this problem, the SPC is proposed to group semantically consistent parts by token clustering. Extensive experiments on several challenging datasets demonstrate the effectiveness of the proposed DSPformer.
What problem does this paper attempt to address?