Abstract:Image-text retrieval is a fundamental task in bridging the semantics between vision and language. The key challenge lies in accurately and efficiently learning the semantic alignment between two heterogeneous modalities. Existing image-text retrieval approaches can be roughly classified into two paradigms. The first independent-embedding paradigm is to learn the global embeddings of two modalities, which can achieve efficient retrieval while failing to effectively capture the cross-modal fine-grained interaction information between images and texts. The second interactive-embedding paradigm is to learn fine-grained alignment between regions and words, which can achieve accurate retrieval while sacrificing retrieval efficiency. In this paper, we propose a novel Independent Memory-Enhanced emBedding learning framework (IMEB), which introduces a lightweight middleware, i.e ., memory network, into the independent-embedding approaches to simultaneously exploit the complementary of both paradigms. Specifically, first, in the training stage, we propose a novel cross-modal association graph to learn cross-modal fine-grained interaction information. Then, we delicately design a memory-assisted embedding learning network to store these prototypical features after interaction as agents, and effectively update the memory network via two learning strategies. Finally, in the inference stage, we directly interact with these agent-level prototypical features from the memory bank, thus efficiently obtaining cross-modal memory-enhanced embeddings. In this way, our model not only effectively learns cross-modal interaction information, but also maintains the retrieval efficiency. Extensive experimental results on two benchmarks, i.e ., Flickr30K and MS-COCO, demonstrate that our IMEB performs favorably against state-of-the-art methods.

A memory learning framework for effective image retrieval.

Learning A Semantic Space from User'S Relevance Feedback for Image Retrieval

Semantic Image Retrieval Based on Multiple-Instance Learning

A Memorization Learning Model For Image Retrieval

Improving image retrieval performance by integrating long-term learning with short-term learning

An Efficient and Effective Region-Based Image Retrieval Framework

An Image Retrieval System with Color Emotion Query

A unified framework for image retrieval using keyword and visual features

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Image Retrieval Based on Fuzzy Semantic Relevance Matrix

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval

Image Retrieval Framework Driven by Association Feedback with Feature Elements Evaluation Built In

An active feedback framework for image retrieval

Optimal Adaptive Learning For Image Retrieval

A new image retrieval system supporting query by semantics and example

Image Retrieval Model Providing Semantics and Visual-Features-based Query for Users

A Relevance Feedback Framework for Image Retrieval Based on Ant Colony Algorithm.

Web-Based Image Retrieval: A Hybrid Approach

Learning Semantic Concepts from User Feedback Log for Image Retrieval

Learning to Combine Ad-hoc Ranking Functions for Image Retrieval

A novel learning-based method for image retrieval