Action classification by exploring directional co-occurrence of weighted stips

Liu Mengyuan,Liu Hong,Sun Qianru
DOI: https://doi.org/10.1109/ICIP.2014.7025292
2014-01-01
Abstract:Human action recognition is challenging mainly due to intro-variety, inter-ambiguity and clutter backgrounds in real videos. Bag-of-visual words model utilizes spatio-temporal interest points(STIPs), and represents action by the distribution of points which ignores visual context among points. To add more contextual information, we propose a method by encoding spatio-temporal distribution of weighted pairwise points. First, STIPs are extracted from an action sequence and clustered into visual words. Then, each word is weighted in both temporal and spatial domains to capture the relationships with other words. Finally, the directional relationships between co-occurrence pairwise words are used to encode visual contexts. We report state-of-the-art results on Rochester and UT-Interaction datasets to validate that our method can classify human actions with high accuracies. © 2014 IEEE.
What problem does this paper attempt to address?