Abstract:Due to its low computational' cost, excellent storage capacity, and efficient retrieval performance, unsupervised deep cross-modal hashing methods have received extensive attention. However, there are still some challenges with existing unsupervised methods: (1) Due to the lack of label semantics, the neighborhood structure information of unimodal and inter-modal instances may not be fully integrated, resulting in ignoring the deep semantic similarity interaction information. (2) Unsupervised hash codes can neither effectively resolve the semantic consistency between the original features of modal instances nor bridge the gap between the heterogeneous modalities of hash codes. To address these issues, we propose a new unsupervised deep cross-modal hash method called Multi-Perspective Fusing Semantic Alignment Hashing (MPFSAH). It mainly includes two aspects. Firstly, to enhance inter-modal communication, a Multi-level Semantic Similarity Interactive Measure (MSSIM) is constructed. By fusing the neighborhood structure of different modalities and increasing the distance between instances within a modality, the semantic interaction similarity can be deeply mined, to obtain discriminative semantic information. Moreover, we also propose a novel Multi-Perspective Semantic Alignment Mechanism (MPSAM). By minimizing the consistency quantization error of elements in the multi-perspective similarity, it learns the inter-modal similarity consistency. MPSAM includes similarity consistency alignment, structural-semantic alignment, and ranking alignment. It achieves structural-semantic consistency fully ensures the effective connection of cross-modal data similarities and bridges the modal gap in the process of hash codes. Through experiments on three cross-modal retrieval datasets, we demonstrate the effectiveness of our proposed method, which outperforms some state-of-the-art methods.

Using Multi-Modal Semantic Association Rules to Fuse Keywords and Visual Features Automatically for Web Image Retrieval

Bridging the Semantic Gap Between Image Contents and Tags

Multimodal association mining for personalized image browsing

A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval

Dynamic Multimodal Fusion in Video Search

Combining Convolutional Neural Network and Markov Random Field for Semantic Image Retrieval

Concept-Driven Multi-Modality Fusion for Video Search

MUST: an Effective and Scalable Framework for Multimodal Search of Target Modality

Prospective Study for Semantic Inter-Media Fusion in Content-Based Medical Image Retrieval

Multi-Granularity Semantic Information Integration Graph for Cross-Modal Hash Retrieval

Optimizing Multimodal Reranking for Web Image Search

Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

Web-Based Image Retrieval: A Hybrid Approach

A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Multi-Modal Image Retrieval for Complex Queries using Small Codes

Semantics-Assisted Multiview Fusion for SAR Automatic Target Recognition

OCTOPUS: aggressive search of multi-modality data using multifaceted knowledge base.

Indexing and Integrating Multiple Features for WWW Images

Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval

A unified framework for image retrieval using keyword and visual features