Abstract:Multimedia data, especially image and video data, have become one of the most overwhelming data types on the Internet recently. Considering the user experience and real application requirements, multimedia data always demand a real-time processing speed. As a result, the huge amount of such data make retrieving useful information from them not only data-intensive, but also computation-intensive, which poses significant challenges to current system and architecture designs. Unfortunately, most prior studies focus only on text based retrieval systems or traditional multimedia processing applications. As far as we know, there is no systematic study on analyzing the characteristics of multimedia retrieval applications and how they might impact system and architecture designs. In this paper, we make the first attempt to construct a multimedia retrieval benchmark suite (called MMR Bench) to evaluate the corresponding system and architecture designs. To embody diverse multimedia retrieval applications, we collect eight state-of-the-art multimedia retrieval algorithms which cover the whole retrieval stages, including feature extraction, feature matching, and spatial verification. To satisfy diverse evaluation purposes, we implement multiple versions for each algorithm, including sequential version, pthread version for multi-core evaluation and data-parallel (i.e., Map-reduce) version for data-center evaluation. Moreover, MMR Bench provides flexible interfaces through retrieval stages, as well as a tool to adjust parameters and regenerating different scales of reasonable input. With such a flexible design, the algorithms in MMR Bench may be not only suitable for individual kernel-level evaluation, but also capable to be integrated into a complete infrastructure for system-level evaluation. Based on MMR Bench, we further analyze the inherent architectural characteristics, such as input size sensitivity and workload balance, which provides some insights into system and architecture design for multimedia retrieval applications.

Mixer: Efficiently Understanding and Retrieving Visual Content at Web-Scale.

Learning User Interest with Improved Triplet Deep Ranking and Web-Image Priors for Topic-Related Video Summarization.

Mixer is more than just a model

Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval

Multimedia Content-Based Visual Retrieval

ChebMixer: Efficient Graph Representation Learning with MLP Mixer

DynaMixer: A Vision MLP Architecture with Dynamic Mixing.

TCAMixer: A lightweight Mixer based on a novel triple concepts attention mechanism for NLP

CAMixerSR: Only Details Need More "Attention"

MixerSR: A New Feature Extraction Paradigm for Single Image Super-Resolution

Characterizing Multi-media Retrieval Applications

SMMix: Self-Motivated Image Mixing for Vision Transformers

Characterizing Multimedia Retrieval Applications

AMixer: Adaptive Weight Mixing for Self-attention Free Vision Transformers.

MagicMix: Semantic Mixing with Diffusion Models

Efficient Indexing, Browsing and Retrieval of Image/video Content

AdaMixer: A Fast-Converging Query-Based Object Detector

Multi-Scale MLP-Mixer for image classification

MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Fast And Accurate Content-Based Semantic Search In 100m Internet Videos

MixFormer: Mixing Features across Windows and Dimensions