Multi-scale Visual Attention for Attribute Disambiguation in Zero-Shot Learning

Long Tian,Bo Chen,Jie Ren,Hao Zhang,Zhenhua Wu,Ning Han,Yuanwei Chen,Hongwei Liu
DOI: https://doi.org/10.1016/j.image.2021.116614
IF: 3.453
2022-01-01
Signal Processing Image Communication
Abstract:Observing the phenomenon that the discriminative visual features and unambiguous attribute descriptions are important in zero-shot learning (ZSL), we propose a Multi-scale Visual Attention for Attribute Disambiguation (MVAAD). MVAAD contains a Multi-Scale Visual Attention Network (MSVAN) to realize attentions on image regions, which helps MVAAD to learn more discriminative visual features. Based on the multi-scale visual features in MSVAN, we also develop a Coarse-to-fine Visual-guided Attribute Selection Module (CVASM) to use the multi-scale visual attentive features for attribute disambiguation. Both of MSVAN and CVASM can be jointly trained in an end-to-end manner by minimizing the visual-semantic classification loss and the latent visual contrastive triplet loss. Experimental results on four popular ZSL benchmarks, AwA2, CUB, SUN and FLO, illustrate that MVAAD is able to not only achieve the state-of-the-art performance, but also give meaningful and explainable visualizations on the visual attention and the attribute selection.
What problem does this paper attempt to address?