Instantly Telling What Happens in a Video Sequence Using Simple Features

Liang Wang,Yizhou Wang,Tingting Jiang,Wen Gao
DOI: https://doi.org/10.1109/cvpr.2011.5995377
2011-01-01
Abstract:This paper presents an efficient method to tell what happens (e.g. recognize actions) in a video sequence from only a couple of frames in real time. For the sake of instantaneity, we employ two types of computationally efficient but perceptually important features, optical flow and edge, to capture motion and shape/structure information in video sequences. It is known that the two types of features are not sparse and can be unreliable or ambiguous at certain parts of a video. In order to endow them with strong discriminative power, we extend an efficient contrast set mining technique, the Emerging Pattern (EP) mining method, to learn joint features from videos to differentiate action classes. Experimental results show that the combination of the two types of features achieves superior performance in differentiating actions than that of using each single type of features alone. The learned features are discriminative, statistically significant (reliable) and display semantically meaningful shape-motion structures of human actions. Besides the instant action recognition, we also extend the proposed approach to anomaly detection and sequential event detection. The experiments demonstrate encouraging results.
What problem does this paper attempt to address?