Abstract:Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is typically represented by attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant and sufficient visual-semantic interaction for advancing ZSL. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferable and discriminative attribute localization of visual features for representing the key semantic knowledge for effective knowledge transfer in ZSL. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for key semantic knowledge representations in ZSL. Specifically, TransZero++ employs an attribute → visual Transformer sub-net (AVT) and a visual → attribute Transformer sub-net (VAT) to learn attribute-based visual features and visual-based attribute features, respectively. By further introducing feature-level and prediction-level semantical collaborative losses, the two attribute-guided transformers teach each other to learn semantic-augmented visual embeddings for key semantic knowledge representations via semantical collaborative learning. Finally, the semantic-augmented visual embeddings learned by AVT and VAT are fused to conduct desirable visual-semantic interaction cooperated with class semantic vectors for ZSL classification. Extensive experiments show that TransZero++ achieves the new state-of-the-art results on three golden ZSL benchmarks and on the large-scale ImageNet dataset. The project website is available at: https://shiming-chen.github.io/TransZero-pp/TransZero-pp.html.

Feature Fine-Tuning and Attribute Representation Transformation for Zero-Shot Learning.

GENERATING MANIFOLD-ALIGNED SEMANTIC FEATURE FOR ZERO-SHOT LEARNING

Boosting Zero-shot Learning via Contrastive Optimization of Attribute Representations

Discriminative and Robust Attribute Alignment for Zero-Shot Learning

Exploiting Semantic Attributes for Transductive Zero-Shot Learning

TransZero: Attribute-guided Transformer for Zero-Shot Learning

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

Learning complementary semantic information for zero-shot recognition

Attribute self-representation steered by exclusive lasso for zero-shot learning

Attentive Semantic Preservation Network for Zero-Shot Learning.

Multi-modal Generative Adversarial Network for Zero-Shot Learning

ZS-VAT: Learning Unbiased Attribute Knowledge for Zero-Shot Recognition Through Visual Attribute Transformer

Visual-guided attentive attributes embedding for zero-shot learning

High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning

Zero-Shot Learning via Structure-Aligned Generative Adversarial Network

Zero-shot Recognition with Latent Visual Attributes Learning.

Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning

Attribute-Modulated Generative Meta Learning for Zero-shot Learning

Deep Representation of Hierarchical Semantic Attributes for Zero-shot Learning

Learning Modality-Consistent Latent Representations for Generalized Zero-Shot Learning