Abstract:Multi-modal hashing can encode the large-scale social geo-media multimedia data from multiple sources into a common discrete hash space, in which the heterogeneous correlations from multiple modalities could be well explored and preserved into the objective semantic-consistent hash codes. The current researches on multi-modal hashing mainly focus on performing common data reconstruction, but they fail to effectively distill the intrinsic and consensus structures of multi-modal data and fully exploit the inherent semantic knowledge to capture semantic-consistent information across multiple modalities, leading to unsatisfactory retrieval performance. To facilitate this problem and develop an efficient multi-modal geographical retrieval method, in this article, we propose a discriminative multi-modal hashing framework named Cognitive Multi-modal Consistent Hashing (CMCH), which can progressively pursue the structure consensus over heterogeneous multi-modal data and simultaneously explore the informative transformed semantics. Specifically, we construct a parameter-free collaborative multi-modal fusion module to incorporate and excavate the underlying common components from multi-source data. Particularly, our formulation seeks for a joint multi-modal compatibility among multiple modalities under a self-adaptive weighting manner, which can take full advantages of their complementary properties. Moreover, a cognitive self-paced learning policy is further leveraged to conduct progressive feature aggregation, which can coalesce multi-modal data onto the established common latent space in a curriculum learning mode. Furthermore, deep semantic transform learning is elaborated to generate flexible semantics for interactively guiding collaborative hash codes learning. An efficient discrete learning algorithm is devised to address the resulting optimization problem, which obtains stable solutions when dealing with large-scale multi-modal retrieval tasks. Sufficient experiments performed on four large-scale multi-modal datasets demonstrate the encouraging performance of the proposed CMCH method in comparison with the state-of-the-arts over multi-modal information retrieval and computational efficiency. The source codes of this work could be available at https://github.com/JunfengAn1998a/CMCH .

ConGMC: Consistency-Guided Multimodal Clustering via Mutual Information Maximin

A Bottleneck Network with Light Attention for Multimodal Clustering

Consensus Clustering With Co-Association Matrix Optimization

Cluster-aware Multiplex InfoMax for Unsupervised Graph Representation Learning

Consensus Cluster Center Guided Latent Multi-kernel Clustering

Nice to meet images with Big Clusters and Features: A cluster-weighted multi-modal co-clustering method

Cross-Modal Clustering With Deep Correlated Information Bottleneck Method

Consistent Multiple Graph Embedding for Multi-View Clustering

Multimodal Fusion Balancing Through Game-Theoretic Regularization

Multimodal Generalized Category Discovery

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

Cluster center consistency guided sampling learning for multiple kernel clustering

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View

Late Fusion Multiple Kernel Clustering with Local Kernel Alignment Maximization

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations

MCSFF: Multi-modal Consistency and Specificity Fusion Framework for Entity Alignment

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Dual Contrast-Driven Deep Multi-View Clustering

Cognitive multi-modal consistent hashing with flexible semantic transformation