Abstract:Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.

Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing