Adaptive multi-scale semantic fusion network for zero-shot learning

Jing Song,Peixi Peng,Yunpeng Zhai,Chong Zhang,Yonghong Tian
DOI: https://doi.org/10.1109/ICMEW53276.2021.9455945
2021-01-01
Abstract:Zero-shot learning aims at accurately recognizing unseen objects by learning matrices that bridge the gap between visual information and semantic attributes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding while neglecting the effect of learning discriminative semantic features, which leads to severe semantic ambiguity. We propose a practical Adaptive Multi-scale Semantic Fusion (AMSF) framework to perform object-based multi-scale attribute attention for semantic disambiguation. Considering both low-level visual information and global class-level features that relate to this ambiguity, the proposed method jointly learns cooperative global and local semantic attributes from different scales. Moreover, with the joint supervision of embedding softmax loss and class-center triplet loss, the model is encouraged to learn high discriminative semantic features and visual features with high interclass dispersion and infra-class compactness. The method is evaluated on CUB, AwA2, and SUN datasets, and the experimental results indicate the method achieves state-of-the-art performance.
What problem does this paper attempt to address?