Hydroxyethylstärke: Stellungnahme der Präsidenten der DGAI und des BDA sowie der Präsidentin der DAAF

C. Werner,G. Geldner,T. Koch

DOI: https://doi.org/10.1055/s-0033-1352487

2013-07-01

Abstract:

What problem does this paper attempt to address?

Joint Learning of Attended Zero-Shot Features and Visual-Semantic Mapping.

Yanan Li,Donghui Wang

2019-01-01

Abstract:Zero-shot learning (ZSL) aims to recognize unseen categories by associating image features with semantic embeddings of class labels and its performance can be improved progressively through learning better features and more generalized visual-semantic mapping (V-S mapping) to unseen classes. Current methods typically learn feature extractors and V-S mapping independently. In this work, we propose a simple but effective joint learning framework with fused autoencoder (AE) paradigm, which can simultaneously learn features specific to ZSL task as well as V-S mapping inseparable to learning features. In particular, the encoder in AE can not only transfer semantic knowledge to the feature space, but also achieve semantics-guided attended feature learning. At the same time, the decoder in AE can be used as a V-S mapping, which further improves the generalization ability to unseen classes. Extensive experiments show that the proposed approach can achieve promising results.
Domain Adaptation Meets Zero-Shot Learning: an Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation

Cheng Bian,Chenglang Yuan,Kai Ma,Shuang Yu,Dong Wei,Yefeng Zheng

DOI: https://doi.org/10.1109/tmi.2021.3131245

IF: 10.6

2022-01-01

IEEE Transactions on Medical Imaging

Abstract:Due to the lack of properly annotated medical data, exploring the generalization capability of the deep model is becoming a public concern. Zero-shot learning (ZSL) has emerged in recent years to equip the deep model with the ability to recognize unseen classes. However, existing studies mainly focus on natural images, which utilize linguistic models to extract auxiliary information for ZSL. It is impractical to apply the natural image ZSL solutions directly to medical images, since the medical terminology is very domain-specific, and it is not easy to acquire linguistic models for the medical terminology. In this work, we propose a new paradigm of ZSL specifically for medical images utilizing cross-modality information. We make three main contributions with the proposed paradigm. First, we extract the prior knowledge about the segmentation targets, called relation prototypes, from the prior model and then propose a cross-modality adaptation module to inherit the prototypes to the zero-shot model. Second, we propose a relation prototype awareness module to make the zero-shot model aware of information contained in the prototypes. Last but not least, we develop an inheritance attention module to recalibrate the relation prototypes to enhance the inheritance process. The proposed framework is evaluated on two public cross-modality datasets including a cardiac dataset and an abdominal dataset. Extensive experiments show that the proposed framework significantly outperforms the state of the arts.
Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition

Yunqing Hu,Xuan Jin,Xi Chen,Yin Zhang

DOI: https://doi.org/10.1109/icassp49357.2023.10095788

2023-01-01

Abstract:Multi-label zero-shot learning (ML-ZSL), with the difficulty of both multi-label learning and zero-shot learning, aims to recognize various unseen objects that are not observed during training. Previous methods mainly use a single directional visual-semantic mapping to associate the visual and semantic embedding space, which is not sufficient to adequately realize knowledge transfer from seen to unseen classes. In this paper, we propose a novel dual collaborative visual-semantic mapping framework, constructing abundant connection relationships by exploring two aspects of mapping streams, i.e., the visual-to-semantic (V2S) mapping and the semantic-to-visual (S2V) mapping. Through the collaborative learning of these two effective mappings, our method achieves state-of-the-art performance on the MS-COCO and PASCAL-VOC, two benchmarks for ML-ZSL.
Towards Effective Deep Embedding for Zero-Shot Learning

Lei Zhang,Peng Wang,Lingqiao Liu,Chunhua Shen,Wei Wei,Yanning Zhang,Anton van den Hengel

DOI: https://doi.org/10.1109/tcsvt.2020.2984666

IF: 5.859

2020-09-01

IEEE Transactions on Circuits and Systems for Video Technology

Abstract:Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

engineering, electrical & electronic
Manifold Regularized Cross-Modal Embedding for Zero-Shot Learning

Zhong Ji,Yunlong Yu,Yanwei Pang,Jichang Guo,Zhongfei Zhang

DOI: https://doi.org/10.1016/j.ins.2016.10.025

IF: 8.1

2017-01-01

Information Sciences

Abstract:Zero-Shot Learning (ZSL) aims at classifying previously unseen class samples and has gained its popularity in applications where samples of some categories are scarce for training. The basic idea to address this issue is transferring knowledge from the seen classes to the unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. The class semantic information can be obtained from human-labeled attributes or text corpus in an unsupervised fashion. Therefore, the embedding function from visual space to the embedding space is extremely important. However, the existing embedding approaches to ZSL mainly focus on aligning pairwise semantic consistency from heterogeneous spaces but ignore the intrinsic structure of the locally homogeneous isomorph. In order to preserve the locally visual structure in the embedding process, this paper proposes a Manifold regularized Cross-Modal Embedding (MCME) approach for ZSL by formulating the manifold constraint for intrinsic structure of the visual features as well as aligning pairwise consistency. The linear, closed-form solution makes MCME efficient to compute. Furthermore, rather than applying the embedding function learned from the seen classes directly, we also propose a new domain adaptation strategy to overcome the domain-shift problem during the knowledge transfer process. The MCME with the domain adaptation method is called MCME-DA. Extensive experiments on the benchmark datasets of AwA and CUB validate the superiority and promise of MCME and MCME-DA.
Meta-Transfer Networks for Zero-Shot Learning

Yunlong Yu,Zhongfei Zhang,Jungong Han

2019-01-01

Abstract:Zero-Shot Learning (ZSL) aims at recognizing unseen categories using some class semantics of the categories. The existing studies mostly leverage the seen categories to learn a visual-semantic interaction model to infer the unseen categories. However, the disjointness between the seen and unseen categories cannot ensure that the models trained on the seen categories generalize well to the unseen categories. In this work, we propose an episode-based approach to accumulate experiences on addressing disjoint-ness issue by mimicking extensive classiﬁcation scenarios where training classes and test classes are disjoint. In each episode, a visual-semantic interaction model is ﬁrst trained on a subset of seen categories as a learner that provides an initial prediction for the rest disjoint seen categories and then a meta-learner ﬁne-tunes the learner by minimizing the differences between the prediction and the ground-truth labels in a pre-deﬁned space. By training extensive episodes on the seen categories, the model is trained to be an expert in predicting the mimetic unseen categories, which will generalize well to the real unseen categories. Extensive experiments on four datasets under both the traditional ZSL and generalized ZSL tasks show that our framework out-performs the state-of-the-art approaches by large margins.
Multi-modal Generative Adversarial Network for Zero-Shot Learning

Zhong Ji,Kexin Chen,Junyue Wang,Yunlong Yu,Zhongfei Zhang

DOI: https://doi.org/10.1016/j.knosys.2020.105847

IF: 8.139

2020-01-01

Knowledge-Based Systems

Abstract:In this paper, we propose a novel approach for Zero-Shot Learning (ZSL), where the test instances are from the novel categories that no visual data are available during training. The existing approaches typically address ZSL by embedding the visual features into a category-shared semantic space. However, these embedding-based approaches easily suffer from the “heterogeneity gap” issue since a single type of class semantic prototype cannot characterize the categories well. To alleviate this issue, we assume that different class semantics reflect different views of the corresponding class, and thus fuse various types of class semantic prototypes resided in different semantic spaces with a feature fusion network to generate pseudo visual features. Through the adversarial mechanism of the real visual features and the fused pseudo visual features, the complementary semantics in various spaces are effectively captured. Experimental results on three benchmark datasets demonstrate that the proposed approach achieves impressive performances on both traditional ZSL and generalized ZSL tasks.
Zero-Shot Leaning With Manifold Embedding

Yunlong Yu,Zhong Ji,Yanwei Pang

DOI: https://doi.org/10.1007/978-3-030-02698-1_12

2018-01-01

Abstract:Zero-Shot Learning (ZSL) has gained its popularity recently owing to its promising characteristic that requires no training data to recognize new visual classes. One key technique is to transfer knowledge from the seen classes to the new unseen classes in an intermediate embedding space for both visual and textual modalities. Therefore, the construction of the embedding space is extremely important. Manifold embedding is able to well capture the intrinsic structure of the embedding space. To this end, with the assumption that the distribution of the semantic categories in the word vector space has an intrinsic manifold structure, this paper proposes a Manifold Embedding based ZSL (ME-ZSL) approach by formulating the manifold structure for the visual to textual embedding with the intra-class compactness, the inter-class separability, and the locality preservation. The linear, closed-form solution makes ME-ZSL efficient to compute. Extensive experiments on the popular AwA and CUB datasets validate the effectiveness of ME-ZSL.
Semantic Consistent Embedding for Domain Adaptive Zero-Shot Learning

Jianyang Zhang,Guowu Yang,Ping Hu,Guosheng Lin,Fengmao Lv

DOI: https://doi.org/10.1109/tip.2023.3293769

IF: 10.6

2023-07-22

IEEE Transactions on Image Processing

Abstract:Unsupervised domain adaptation has limitations when encountering label discrepancy between the source and target domains. While open-set domain adaptation approaches can address situations when the target domain has additional categories, these methods can only detect them but not further classify them. In this paper, we focus on a more challenging setting dubbed Domain Adaptive Zero-Shot Learning (DAZSL), which uses semantic embeddings of class tags as the bridge between seen and unseen classes to learn the classifier for recognizing all categories in the target domain when only the supervision of seen categories in the source domain is available. The main challenge of DAZSL is to perform knowledge transfer across categories and domain styles simultaneously. To this end, we propose a novel end-to-end learning mechanism dubbed Three-way Semantic Consistent Embedding (TSCE) to embed the source domain, target domain, and semantic space into a shared space. Specifically, TSCE learns domain-irrelevant categorical prototypes from the semantic embedding of class tags and uses them as the pivots of the shared space. The source domain features are aligned with the prototypes via their supervised information. On the other hand, the mutual information maximization mechanism is introduced to push the target domain features and prototypes towards each other. By this way, our approach can align domain differences between source and target images, as well as promote knowledge transfer towards unseen classes. Moreover, as there is no supervision in the target domain, the shared space may suffer from the catastrophic forgetting problem. Hence, we further propose a ranking-based embedding alignment mechanism to maintain the consistency between the semantic space and the shared space. Experimental results on both I2AwA and I2WebV clearly validate the effectiveness of our method. Code is available at https://github.com/tiggers23/TSCE-Domain-Adaptive-Zero-Shot-Learning.

computer science, artificial intelligence,engineering, electrical & electronic
Learning discriminative visual semantic embedding for zero-shot recognition

Yurui Xie,Tiecheng Song,Jianying Yuan

DOI: https://doi.org/10.1016/j.image.2023.116955

2023-03-15

Abstract:We present a novel zero-shot learning (ZSL) method that concentrates on strengthening the discriminative visual information of the semantic embedding space for recognizing object classes. To address the ZSL problem, many previous works strive to learn a transformation to bridge the visual features and semantic representations, while ignoring that the discriminative property of the semantic embedding space can benefit zero-shot prediction tasks. Among these existing approaches, human-defined attributes are typically employed to build up the mid-level semantics. However, the discriminative capability and completeness of manually defined attributes are hard to guarantee, which may easily cause semantic ambiguity. To alleviate this issue, we propose a discriminative visual semantic embedding (DVSE) model that formulates the ZSL problem as a supervised dictionary learning framework. The proposed method is capable of exploring a set of discriminative visual attributes and ensures knowledge transfer across categories. Moreover, a unified objective is introduced to generate an augmented semantic embedding space where these learned visual attributes and human-defined attributes are incorporated jointly for consolidating the visual cues of feature representations. Finally, we treat the DVSE model as an optimization problem and further propose an iterative solver. Extensive experiments on several challenging benchmark datasets demonstrate that the proposed method achieves favorable performances compared with state-of-the-art ZSL approaches.

engineering, electrical & electronic
Zero-Knowledge Zero-Shot Learning for Novel Visual Category Discovery

Zhaonan Li,Hongfu Liu

DOI: https://doi.org/10.48550/arXiv.2302.04427

2023-02-09

Abstract:Generalized Zero-Shot Learning (GZSL) and Open-Set Recognition (OSR) are two mainstream settings that greatly extend conventional visual object recognition. However, the limitations of their problem settings are not negligible. The novel categories in GZSL require pre-defined semantic labels, making the problem setting less realistic; the oversimplified unknown class in OSR fails to explore the innate fine-grained and mixed structures of novel categories. In light of this, we are motivated to consider a new problem setting named Zero-Knowledge Zero-Shot Learning (ZK-ZSL) that assumes no prior knowledge of novel classes and aims to classify seen and unseen samples and recover semantic attributes of the fine-grained novel categories for further interpretation. To achieve this, we propose a novel framework that recovers the clustering structures of both seen and unseen categories where the seen class structures are guided by source labels. In addition, a structural alignment loss is designed to aid the semantic learning of unseen categories with their recovered structures. Experimental results demonstrate our method's superior performance in classification and semantic recovery on four benchmark datasets.

Computer Vision and Pattern Recognition,Machine Learning
Zero-Shot Learning With Attentive Region Embedding and Enhanced Semantics

Yang Liu,Yuhao Dang,Xinbo Gao,Jungong Han,Ling Shao

DOI: https://doi.org/10.1109/tnnls.2022.3202014

IF: 14.255

2022-01-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:The performance of zero-shot learning (ZSL) can be improved progressively by learning better features and generating pseudosamples for unseen classes. Existing ZSL works typically learn feature extractors and generators independently, which may shift the unseen samples away from their real distribution and suffers from the domain bias problem. In this article, to tackle this challenge, we propose a variational autoencoder (VAE)-based framework, that is, joint Attentive Region Embedding with Enhanced Semantics (AREES), which is tailored to advance the zero-shot recognition. Specifically, AREES is end-to-end trainable and consists of three network branches: 1) attentive region embedding is used to learn the semantic-guided visual features by the attention mechanism (AM); 2) a decomposition structure and a semantic pivot regularization are used to extract enhanced semantics; and 3) a multimodal VAE (mVAE) with the cross-reconstruction loss and the distribution alignment loss is used to obtain a shared latent embedding space of visual features and semantics. Finally, features' extraction and features' generation are optimized together in AREES to address the domain shift problem to a large extent. The comprehensive evaluations on six benchmarks, including the ImageNet, demonstrate the superiority of the proposed model over its state-of-the-art counterparts.

computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
Zero-Shot Embedding via Regularization-Based Recollection and Residual Familiarity Processes

Mengyao Lyu,Hu Han,Xiangzhi Bai

DOI: https://doi.org/10.1109/TSMC.2021.3102834

2022-01-01

IEEE Transactions on Systems, Man, and Cybernetics: Systems

Abstract:The goal of zero-shot learning (ZSL) is to transfer knowledge learned from seen classes during training to unseen classes for testing, with the help of auxiliary information, such as attributes and descriptions. Most of the existing methods view ZSL as a label-embedding problem, in which class and image representations are embedded to a common space. However, many methods either show a bias toward seen classes caused by the projection domain-shift problem, or sacrifice the performance of seen classes to generalize to unseen ones. In this article, we present an embedding approach for ZSL, which is motivated by human recognition memory, namely, recollection and familiarity (R&F). We propose a decoder to regularize the nonlinear mapping between the semantic space and the visual space, which represents the reasonable recollection process, and use a residual block to refine the recognition ability for seen classes, which indicates the familiarity process. R&F can generalize well to unseen classes, while retaining the discriminative ability for the seen classes. Extensive experiments are conducted on Animals with Attribute (AwA1), Animals with Attributes 2 (AwA2), Attribute Pascal&Yahoo (aPY), SUN Attribute (SUN), Caltech-UCSD-Birds 200-2011 (CUB), and ImageNet databases. As qualitative and quantitative results show, the proposed approach outperforms state-of-the-art embedding-based methods by a large margin and significantly alleviates the projection domain-shift problem.
Zero-Shot Learning via Discriminative Dual Semantic Auto-Encoder

Nan Xing,Yang Liu,Hong Zhu,Jing Wang,Jungong Han

DOI: https://doi.org/10.1109/access.2020.3046573

IF: 3.9

2021-01-01

IEEE Access

Abstract:Zero-shot learning (ZSL) is an effective method to perform the recognition task without any training samples of specific classes. Most existing ZSL models put emphasis on learning an embedding between visual space and semantic space directly. However, few ZSL models research whether the human-designed semantic features are discriminative enough to recognize different classes. Moreover, one-way mapping suffers from the project domain shift problem. In this article, we propose to learn a Discriminative Dual Semantic Auto-encoder (DDSA) based on the encoder-decoder paradigm to solve this problem. DDSA attempts to construct two bidirectional embeddings to connect the visual space and the semantic space with the help of the learned aligned space which includes discriminative information of the visual features and semantic features. Based on the DDSA, we additionally propose a Deep DDSA to capture deep aligned features that are more conducive to zero-shot classification. The key to the proposed framework is that it implicitly exact the principal information from visual space and semantic space to construct aligned features, which is not only semantic-preserving but also discriminative. Extensive experiments on five benchmarks (SUN, CUB, AWA1, AWA2 and aPY) demonstrate the effectiveness of the proposed framework with state-of-the-art performance obtained on both conventional ZSL and generalized ZSL settings.

computer science, information systems,telecommunications,engineering, electrical & electronic
Transductive Unbiased Embedding for Zero-Shot Learning

Jie Song,Chengchao Shen,Yezhou Yang,Yang Liu,Mingli Song

DOI: https://doi.org/10.1109/cvpr.2018.00113

2018-01-01

Abstract:Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes. So they yield poor performance after being deployed in the generalized ZSL settings. In this paper, we propose a straightforward yet effective method named Quasi-Fully Supervised Learning (QFSL) to alleviate the bias problem. Our method follows the way of transductive learning, which assumes that both the labeled source images and unlabeled target images are available for training. In the semantic embedding space, the labeled source images are mapped to several fixed points specified by the source categories, and the unlabeled target images are forced to be mapped to other points specified by the target categories. Experiments conducted on AwA2, CUB and SUN datasets demonstrate that our method outperforms existing state-of-the-art approaches by a huge margin of 9.3 ~ 24.5% following generalized ZSL settings, and by a large margin of 0.2 ~ 16.2% following conventional ZSL settings.
OntoZSL: Ontology-enhanced Zero-shot Learning

Yuxia Geng,Jiaoyan Chen,Zhuo Chen,Jeff Z. Pan,Zhiquan Ye,Zonggang Yuan,Yantao Jia,Huajun Chen

DOI: https://doi.org/10.1145/3442381.3450042

2021-01-01

Abstract:Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e.g., features) from training classes (i.e., seen classes) to unseen classes. However, the priors adopted by the existing methods are relatively limited with incomplete semantics. In this paper, we explore richer and more competitive prior knowledge to model the inter-class relationship for ZSL via ontology-based knowledge representation and semantic embedding. Meanwhile, to address the data imbalance between seen classes and unseen classes, we developed a generative ZSL framework with Generative Adversarial Networks (GANs). Our main findings include: (i) an ontology-enhanced ZSL framework that can be applied to different domains, such as image classification (IMGC) and knowledge graph completion (KGC); (ii) a comprehensive evaluation with multiple zero-shot datasets from different domains, where our method often achieves better performance than the state-of-the-art models. In particular, on four representative ZSL baselines of IMGC, the ontology-based class semantics outperform the previous priors e.g., the word embeddings of classes by an average of 12.4 accuracy points in the standard ZSL across two example datasets (see Figure 4).
Visual-guided attentive attributes embedding for zero-shot learning

Rui Zhang,Qi Zhu,Xiangyu Xu,Daoqiang Zhang,Sheng-Jun Huang

DOI: https://doi.org/10.1016/j.neunet.2021.07.031

IF: 7.8

2021-11-01

Neural Networks

Abstract:Zero-shot learning (ZSL) aims to learn a classifier for unseen classes by exploiting both training data from seen classes and external knowledge. In many visual tasks such as image classification, a set of high-level attributes that describe the semantic properties of classes are used as the external knowledge to bridge seen and unseen classes. While the attributes are usually treated equally by previous ZSL studies, we observe that the contribution of different attributes varies significantly over model training. To adaptively exploit the discriminative information embedded in different attributes, we propose a novel encoder-decoder framework with attention mechanism on the attribute level for zero-shot learning. Specifically, by mapping the visual features into a semantic space, the more discriminative attributes are emphasized with larger attention weights. Further, the attentive attributes and the class prototypes are simultaneously decoded to the visual space so that the hubness problem can be eased. Finally, the labels are predicted in the visual space. Extensive experiments on multiple benchmark datasets demonstrate that our proposed model achieves a significant boost over several state-of-the-art methods for ZSL task and comparative results for generalized ZSL task.

computer science, artificial intelligence,neurosciences
Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

Ziming Liu,Jingcai Guo,Song Guo,Xiaocheng Lu

2024-08-25

Abstract:This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics and transferring the learned model to unseen ones. However, they neglect the integrity of local and global features. Although the use of the attention structure will accurately locate local features, especially objects, it will significantly lose its integrity, and the relationship between classes will also be affected. Rough processing of global features will also directly affect comprehensiveness. This neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. In terms of spatial information, we achieve effective refinement by group aggregating image features into several semantic prompts. It can aggregate semantic information rather than class information, preserving the correlation between semantics. In terms of global semantics, we use global forward propagation to collect as much information as possible to ensure that semantics are not omitted. Experiments on large-scale MLZSL benchmark datasets NUS-Wide and Open-Images-v4 demonstrate that the proposed Epsilon outperforms other state-of-the-art methods with large margins.

Computer Vision and Pattern Recognition
Learning a Deep Embedding Model for Zero-Shot Learning

Li Zhang,Tao Xiang,Shaogang Gong

DOI: https://doi.org/10.1109/cvpr.2017.321

2017-01-01

Abstract:Zero-shot learning (ZSL) models rely on learning a joint embedding space where both textual/semantic description of object classes and visual representation of object images can be projected to for nearest neighbour search. Despite the success of deep neural networks that learn an end-toend model between text and images in other vision problems such as image captioning, very few deep ZSL model exists and they show little advantage over ZSL models that utilise deep feature representations but do not learn an end-to-end embedding. In this paper we argue that the key to make deep ZSL models succeed is to choose the right embedding space. Instead of embedding into a semantic space or an intermediate space, we propose to use the visual space as the embedding space. This is because that in this space, the subsequent nearest neighbour search would suffer much less from the hubness problem and thus become more effective. This model design also provides a natural mechanism for multiple semantic modalities (e.g., attributes and sentence descriptions) to be fused and optimised jointly in an end-to-end manner. Extensive experiments on four benchmarks show that our model significantly outperforms the existing models.
Domain-Oriented Semantic Embedding for Zero-Shot Learning

Shaobo Min,Hantao Yao,Hongtao Xie,Zheng-Jun Zha,Yongdong Zhang

DOI: https://doi.org/10.1109/tmm.2020.3033124

IF: 7.3

2021-01-01

IEEE Transactions on Multimedia

Abstract:Zero-Shot Learning (ZSL) targets to recognize images from new classes. Existing methods focus on learning a projection function to associate the visual features and category descriptions in the seen domain, which is directly transferred to the unseen domain. However, due to the inherent domain shift, a single shared projection cannot fully capture the domain difference and similarity, thereby making the unseen samples tend to be recognized as seen categories. In this paper, we propose a novel Domain-Oriented Semantic Embedding (DOSE) network that learns specific projections for different domains to better capture the domain characteristics for unbiased ZSL. Besides a domain-shared projection, DOSE learns two auxiliary domain-specific sub-projections to model the semantic-visual association in respective seen and unseen domains. Specifically, the domain-specific projections are learned in a cycle consistency way to capture domain characteristics, and a domain division constraint is developed to penalize the margin between two domain embeddings. Furthermore, to boost semantic-visual association, a semantic-visual dual attention module is designed to automatically remove trivial information in both visual and semantic embeddings under a co-guidance learning manner. Experiments on four public benchmarks prove that the proposed DOSE is robust to the domain shift problem in ZSL and obtains an averaged 5.6% improvement in terms of harmonic mean.

computer science, information systems,telecommunications, software engineering

Hydroxyethylstärke: Stellungnahme der Präsidenten der DGAI und des BDA sowie der Präsidentin der DAAF

Joint Learning of Attended Zero-Shot Features and Visual-Semantic Mapping.

Domain Adaptation Meets Zero-Shot Learning: an Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation

Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition

Towards Effective Deep Embedding for Zero-Shot Learning

Manifold Regularized Cross-Modal Embedding for Zero-Shot Learning

Meta-Transfer Networks for Zero-Shot Learning

Multi-modal Generative Adversarial Network for Zero-Shot Learning

Zero-Shot Leaning With Manifold Embedding

Semantic Consistent Embedding for Domain Adaptive Zero-Shot Learning

Learning discriminative visual semantic embedding for zero-shot recognition

Zero-Knowledge Zero-Shot Learning for Novel Visual Category Discovery

Zero-Shot Learning With Attentive Region Embedding and Enhanced Semantics

Zero-Shot Embedding via Regularization-Based Recollection and Residual Familiarity Processes

Zero-Shot Learning via Discriminative Dual Semantic Auto-Encoder

Transductive Unbiased Embedding for Zero-Shot Learning

OntoZSL: Ontology-enhanced Zero-shot Learning

Visual-guided attentive attributes embedding for zero-shot learning

Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

Learning a Deep Embedding Model for Zero-Shot Learning

Domain-Oriented Semantic Embedding for Zero-Shot Learning