Abstract:Multi-modal hashing can encode the large-scale social geo-media multimedia data from multiple sources into a common discrete hash space, in which the heterogeneous correlations from multiple modalities could be well explored and preserved into the objective semantic-consistent hash codes. The current researches on multi-modal hashing mainly focus on performing common data reconstruction, but they fail to effectively distill the intrinsic and consensus structures of multi-modal data and fully exploit the inherent semantic knowledge to capture semantic-consistent information across multiple modalities, leading to unsatisfactory retrieval performance. To facilitate this problem and develop an efficient multi-modal geographical retrieval method, in this article, we propose a discriminative multi-modal hashing framework named Cognitive Multi-modal Consistent Hashing (CMCH), which can progressively pursue the structure consensus over heterogeneous multi-modal data and simultaneously explore the informative transformed semantics. Specifically, we construct a parameter-free collaborative multi-modal fusion module to incorporate and excavate the underlying common components from multi-source data. Particularly, our formulation seeks for a joint multi-modal compatibility among multiple modalities under a self-adaptive weighting manner, which can take full advantages of their complementary properties. Moreover, a cognitive self-paced learning policy is further leveraged to conduct progressive feature aggregation, which can coalesce multi-modal data onto the established common latent space in a curriculum learning mode. Furthermore, deep semantic transform learning is elaborated to generate flexible semantics for interactively guiding collaborative hash codes learning. An efficient discrete learning algorithm is devised to address the resulting optimization problem, which obtains stable solutions when dealing with large-scale multi-modal retrieval tasks. Sufficient experiments performed on four large-scale multi-modal datasets demonstrate the encouraging performance of the proposed CMCH method in comparison with the state-of-the-arts over multi-modal information retrieval and computational efficiency. The source codes of this work could be available at https://github.com/JunfengAn1998a/CMCH .

A Multi-Modal Hashing Learning Framework for Automatic Image Annotation

Multi-Modal Multi-Label Semantic Indexing of Images Using Unlabeled Data

Multi-Modal Multi-Label Semantic Indexing Of Images Based On Hybrid Ensemble Learning

Multi-modal Multi-Concept-based Deep Neural Network for Automatic Image Annotation

Deep Multi-Label Hashing For Large-Scale Visual Search Based On Semantic Graph

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals

Automatic Image Annotation Based on Multi-Auxiliary Information

Deep Cross-Modal Hashing with Multi-Task Latent Space Learning

Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval

Labeling images by integrating sparse multiple distance learning and semantic context modeling

Deep Multi-Similarity Hashing for Multi-label Image Retrieval

Cognitive multi-modal consistent hashing with flexible semantic transformation

Improved Deep Unsupervised Hashing with Fine-grained Semantic Similarity Mining for Multi-Label Image Retrieval

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

MAFH: Multilabel Aware Framework for Bit-Scalable Cross-Modal Hashing

Deep Multi-Similarity Hashing with semantic-aware preservation for multi-label image retrieval

Efficient Semi-Supervised Multimodal Hashing With Importance Differentiation Regression

A hybrid hierarchical framework for automatic image annotation

A Framework Of Hashing For Multi-Instance Multi-Label Learning

Adaptive Hypergraph Embedded Semi-Supervised Multi-Label Image Annotation

Partial Multi-Modal Hashing via Neighbor-aware Completion Learning