Mental Visual Indexing

Richang Hong,Jun He,Hanwang Zhang,Tat-Seng Chua
DOI: https://doi.org/10.1145/2964284.2967296
2016-01-01
Abstract:Video browsing describes an interactive process where users want to find a target shot in a long video. Therefore, it is crucial for a video browsing system to be fast and accurate with minimum user effort. In sharp contrast to traditional Relevance Feedback (RF), we propose a novel paradigm for fast video browsing dubbed Mental Visual Indexing (MVI). At each interactive round, the user only needs to select one of the displayed shots that is most visually similar to her mental target and then the user's choice will further tailor the search to the target. The search model update given a user feedback only requires vector inner products, which makes MVI highly responsive. MVI is underpinned by a sequence model in terms of Recurrent Neural Network (RNN), which is trained by automatically generated shot sequences from a rigorous Bayesian framework, which simulates user feedback process. Experimental results on three 3-hour movies conducted by real users demonstrate the effectiveness of the proposed approach.
What problem does this paper attempt to address?