Zero-shot Action Recognition Via Empirical Maximum Mean Discrepancy

Yi Tian,Qiuqi Ruan,Gaoyun An
DOI: https://doi.org/10.1109/icsp.2018.8652306
2018-01-01
Abstract:In the recent years, with a rapid development of video capture devices (Kinect, etc) and online video providers (Youtube, Youku, etc), the number of video data, categories of actions and complexities of videos' content are explosively increasing. Thus, for the traditional supervised human action recognition, it's expensive and intractable to annotate the huge number of videos manually. In the consideration of this situation, researchers dedicate to predict the label of an unseen video using disjoint training videos, which is referred to as zero-shot learning for human action recognition. In this paper, we propose a novel zero-shot action recognition method on the basis of linear regression model, which aims at learning an appropriate visual-to-semantic mapping to project unseen video into a proper semantic representation. On the one hand, we take advantage of both training videos and testing videos and preserve their structural information simultaneously. On the other hand, we introduce an empirical maximum mean discrepancy regularized term into our objective function, which will shorten the differences between learned semantic representations and known action prototypes, and the domain shift problem would be relieved accordingly. Experiments on HMDB51 datasets demonstrate the effectiveness of our novel method.
What problem does this paper attempt to address?