Unsupervised Teacher-Student Model for Large-Scale Video Retrieval.

Dong Liang,Lanfen Lin,Rui Wang,Jie Shao,Changhu Wang,Yen-Wei Chen
DOI: https://doi.org/10.1109/iccvw.2019.00232
2019-01-01
Abstract:With the growth of video-sharing platforms and social media applications, video retrieval plays an import role in many aspects, such as copyright infringement detection, event classification, personalized recommendation, and etc. The content-based video retrieval presents the following two main challenges: (i) Distribution inconsistency for feature representation from the source domain to the target domain. (ii) Difficulty of video aggregation by sufficiently incorporating frame-based information. In this paper, we propose an unsupervised teacher-student model (UTS Net) to improve the performance of the content-based video retrieval tasks: (i) A teacher-student model maintaining the global consistency for feature representation from different domains and retaining the local inconsistency within the intra-batch data; (ii) A simple but effective video retrieval pipeline integrating the frame-level binarized feature. Our proposed framework experimentally outperforms the state-of-the-art approach on the DSVR, CSVR, and ISVR tasks in the FIVR datasets, and achieves a mean average precision of 76%, 72%, and 61%, respectively.
What problem does this paper attempt to address?