Selectively Aggregated Fisher Vectors of Query Video for Mobile Visual Search

Xiaohe Zhang,Yitong Wang,Zhaoliang Liu,Ling-Yu Duan
DOI: https://doi.org/10.1109/bigmm.2016.45
2016-01-01
Abstract:Mobile visual search has undergone a wide development and gained much progress in recent years thanks to the ever-growing computational power of mobile devices. Most visual search methods take a single image as query and generate an image-level representation to implement image retrieval. To form a compact and discriminative representation for the query image, Fisher vectors (FV) have shown great advantage in both discriminability and computational efficiency. However, single image based visual search sometimes has unsatisfactory performance as a number of quality degeneration situations like limited view, uneven lighting, blur, occlusion and etc. may exist in the query image, while a video clip could overcome these shortcomings and contain more sufficient visual information for better retrieval performance when serving as query. Towards a compact yet discriminative representation of the query in mobile visual search, we propose a temporal-spatial based Fisher Vector (TSFV) for the query video with an equal length to an image based FV. The TSFV introduces a selective local feature aggregation scheme that employs interframe feature matching in temporal terms combined with intraframe feature attributes in spatial terms to evaluate video features' discriminability and select only the discriminative ones for aggregation. Evaluated on a diversified dataset, our proposed TSFV for query video achieves a significant performance improvement compared to typical image based FV with no additional transmission load and query latency.
What problem does this paper attempt to address?