Adversarial Query-by-Image Video Retrieval Based on Attention Mechanism

Ruicong Xu,Li Niu,Liqing Zhang
DOI: https://doi.org/10.1007/978-3-030-37731-1_63
2019-01-01
Abstract:The query-by-image video retrieval (QBIVR) is a difficult feature matching task across different modalities. More and more retrieval tasks require indexing the videos containing the activities in the image, which makes extracting meaningful spatio-temporal video features crucial. In this paper, we propose an approach based on adversarial learning, termed Adversarial Image-to-V ideo (AIV) approach. To capture the temporal pattern of videos, we utilize temporal regions likely to contain activities via fully-convolutional 3D ConvNet features, and then obtain the video bag features by 3D RoI Pooling. To solve mismatch issue with image vector features and identify the importances of information for videos, we add a Multiple Instance Learning (MIL) module to assign different weights to each temporal information in video bags. Moreover, we utilize the triplet loss to distinguish different semantic categorites and support intraclass variability of images and videos. Specially, our AIV proposes modality loss as an adversary to the triplet loss in the adversarial learning. The interplay between two losses jointly bridges the domain gap across different modalities. Extensive experiments on two widely used datasets verify the effectiveness of our proposed methods as compared with other methods.
What problem does this paper attempt to address?