Abstract:In recent years, Cross-Modal Hashing (CMH) has aroused much attention due to its fast query speed and efficient storage. Previous literatures have achieved promising results for Cross-Modal Retrieval (CMR) by discovering discriminative hash codes and modality-specific hash functions. Nonetheless, most existing CMR works are subjected to some restrictions: 1) It is assumed that data of different modalities are fully paired, which is impractical in real applications due to sample missing and false data alignment, and 2) binary regression targets including the label matrix and binary codes are too rigid to effectively learn semantic-preserving hash codes and hash functions. To address these problems, this paper proposes an Adaptive Marginalized Semantic Hashing (AMSH) method which not only enhances the discrimination of latent representations and hash codes by adaptive margins, but also can be used for both paired and unpaired CMR. As a two-step method, in the first step, AMSH generates semantic-aware modality-specific latent representations with adaptively marginalized labels, which enlarges the distances between different classes, and exploits the labels to preserve the inter-modal and intra-modal semantic similarities into latent representations and hash codes. In the second step, adaptive margin matrices are embedded into the hash codes, and enlarge the gaps between positive and negative bits, which improves the discrimination and robustness of hash functions. On this basis, AMSH generates similarity-preserving hash codes and robust hash functions without strict one-to-one data correspondence requirement. Experiments are conducted on several benchmark datasets to demonstrate the superiority and flexibility of AMSH over some state-of-the-art CMR methods.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two key challenges in cross - modal retrieval (CMR): 1. **Unpaired Cross - Modal Retrieval (UCMR) problem**: - Most existing CMR methods assume that data in different modalities are fully paired, that is, each image has a corresponding text description and vice versa. However, in practical applications, due to sample loss and incorrect data alignment, this fully paired assumption is often unrealistic. Therefore, how to perform effective cross - modal retrieval on unpaired data is an urgent problem to be solved. 2. **The problem of learning semantic - preserving hash codes and hash functions**: - Existing methods usually use strict binary regression targets (such as label matrices and binary codes), which makes the learned hash codes and hash functions too rigid, difficult to effectively preserve semantic information, and reduces the robustness and discrimination ability of the model. Specifically, these methods ignore the distances between different classes, resulting in poor classification effects. To solve these problems, the author proposes a new method - Adaptive Marginalized Semantic Hashing (AMSH). AMSH enhances the discrimination of latent representations and hash codes by introducing an adaptive margin matrix and can perform cross - modal retrieval on unpaired data. In addition, AMSH also generates semantic - aware modality - specific latent representations through a two - step method and embeds an adaptive margin matrix to widen the gap between positive and negative bits, thereby improving the discrimination and robustness of the hash function. ### Specific solutions 1. **Adaptive Marginalized Regression**: - Introduce an adaptive margin matrix \( E^{(i)} \) to adjust the regression target, increase the distances between different classes, and thus improve the discrimination of features. The formula is as follows: \[ \min_{P^{(i)}, V^{(i)}, E^{(i)}} \sum_{i = 1}^m \| L^{(i)}+R^{(i)} \odot E^{(i)}-P^{(i)} V^{(i)} \|_F^2 \quad \text{s.t.} \quad E^{(i)} \geq 0 \] where \( R^{(i)} \) is an index matrix used to mark positive and negative directions; \( E^{(i)} \) is a margin matrix used to adjust the value of each element. 2. **Semantic Similarity Embedding**: - Enhance semantic information by preserving the similarity between modalities and the compactness within modalities. The formula is as follows: \[ \min_{V^{(i)}} \sum_{i \neq j} \| V^{(j)T} V^{(i)}-r S^{(ji)} \|_F^2+\sum_{i = 1}^m \| V^{(i)T} V^{(i)}-r S^{(ii)} \|_F^2 \] where \( S^{(ij)} \) is a similarity matrix between modalities or within modalities. 3. **Overall optimization objective**: - Combine the above two steps, and the final objective function is: \[ \begin{aligned} &\min \sum_{i = 1}^m \left( \| L^{(i)}+R^{(i)} \odot E^{(i)}-P^{(i)} V^{(i)} \|_F^2+\eta \| B^{(i)}-V^{(i)} \|_F^2+\lambda \| B^{(i)T} V^{(i)}-r S^{(ii)} \|_F^2 \right) \\ &+ \beta \ \end{aligned} \]

Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval

Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval

Semantic Consistency Hashing for Cross-Modal Retrieval

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval

Weakly-Supervised Enhanced Semantic-Aware Hashing for Cross-Modal Retrieval

Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective

Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval.

Discrete Similarity Preserving Hashing for Cross-modal Retrieval.

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Task-adaptive Asymmetric Deep Cross-modal Hashing

Cognitive multi-modal consistent hashing with flexible semantic transformation

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval

Long-tail Cross Modal Hashing.

Label-wise Deep Semantic-Alignment Hashing for Cross-Modal Retrieval.

Cross-modal hashing with missing labels

Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval

Multi-Relational Deep Hashing for Cross-Modal Search

A High-Dimensional Sparse Hashing Framework for Cross-Modal Retrieval

Deep Self-Supervised Hashing With Fine-Grained Similarity Mining for Cross-Modal Retrieval