Action Recognition with Uncertain VLAD

Xianzhong Wang,Hongtao Lu
DOI: https://doi.org/10.1109/ISCID.2014.238
2014-01-01
Abstract:Recognizing human actions in video has gradually attracted much attention in computer vision community, however, it also faces many realistic challenges caused by background clutter, viewpoint changes, variation of actors appearance. These challenges reflect the difficulty of obtaining a clean and discriminative video representation for classification. Recently, VLAD (Vector of Locally Aggregated Descriptors) has shown to be a simple and efficient encoding scheme to obtain discriminative video representations. However, VLAD uses only the nearest visual word in codebook to aggregate each descriptor feature no matter whether it is appropriate or not. Inspired by visual word ambiguity and salience encoding in image classification, we propose Uncertain VLAD (UVLAD) encoding scheme which aggregates each local descriptor feature by considering multiple nearest visual words. The proposed UVLAD scheme ensures each descriptor to be aggregated or discarded appropriately. We evaluate our method on two different benchmark datasets: KTH, and YouTube. Results from experiments show that our encoding scheme outperforms the state-of-arts methods in most cases.
What problem does this paper attempt to address?