Abstract:The problem of cross-modal similarity search, which aims at making efficient and accurate queries across multiple domains, has become a significant and important research topic. Composite quantization, a compact coding solution superior to hashing techniques, has shown its effectiveness for similarity search. However, most existing works utilizing composite quantization to search multi-domain content only consider either pairwise similarity information or class label information across different domains, which fails to tackle the semi-supervised problem in composite quantization. In this paper, we address the semi-supervised quantization problem by considering: (i) pairwise similarity information (without class label information) across different domains, which captures the intra-document relation, (ii) cross-domain data with class label which can help capture inter-document relation, and (iii) cross-domain data with neither pairwise similarity nor class label which enables the full use of abundant unlabelled information. To the best of our knowledge, we are the first to consider both supervised information (pairwise similarity + class label) and unsupervised information (neither pairwise similarity nor class label) simultaneously in composite quantization. A challenging problem arises: how can we jointly handle these three sorts of information across multiple domains in an efficient way? To tackle this challenge, we propose a novel semi-supervised deep quantization (SSDQ) model that takes both supervised and unsupervised information into account. The proposed SSDQ model is capable of incorporating the above three kinds of information into one single framework when utilizing composite quantization for accurate and efficient queries across different domains. More specifically, we employ a modified deep autoencoder for better latent representation and formulate pairwise similarity loss, supervised quantization loss as well as unsupervised distribution match loss to handle all three types of information. The extensive experiments demonstrate the significant improvement of SSDQ over several state-of-the-art methods on various datasets.

Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

A Predictive VQ Based Video Compression Scheme

Contrastive Transformer Hashing for Compact Video Representation

Matching-oriented Embedding Quantization for Ad-hoc Retrieval.

Learnable Central Similarity Quantization for Efficient Image and Video Retrieval

Collective Deep Quantization for Efficient Cross-Modal Retrieval.

Composite Correlation Quantization for Efficient Multimodal Retrieval

Semi-supervised Deep Quantization for Cross-modal Search

Robust video question answering via contrastive cross-modality representation learning

Compositional Correlation Quantization for Large-Scale Multimodal Search.

Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval

Deep Visual-Semantic Quantization For Efficient Image Retrieval

Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders

Contrastive Quantization with Code Memory for Unsupervised Image Retrieval

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method

Contrastive Quant

Online Residual Quantization Via Streaming Data Correlation Preserving

Joint Optimization of Multi-vector Representation with Product Quantization