Abstract:Real-time Human action classification in complex scenes has applications in various domains such as visual surveillance, video retrieval and human robot interaction. While, the task is challenging due to computation efficiency, cluttered backgrounds and intro-variability among same type of actions. Spatio-temporal interest point (STIP) based methods have shown promising results to tackle human action classification in complex scenes efficiently. However, the state-of-the-art works typically utilize bag-of-visual words (BoVW) model which only focuses on the word distribution of STIPs and ignore the distinctive character of word structure. In this paper, the distribution of STIPs is organized into a salient directed graph, which reflects salient motions and can be divided into a time salient directed graph and a space salient directed graph, aiming at adding spatio-temporal discriminant to BoVW. Generally speaking, both salient directed graphs are constructed by labeled STIPs in pairs. In detail, the "directional co-occurrence" property of different labeled pairwise STIPs in same frame is utilized to represent the time saliency, and the space saliency is reflected by the "geometric relationships" between same labeled pairwise STIPs across different frames. Then, new statistical features namely the Time Salient Pairwise feature (TSP) and the Space Salient Pairwise feature (SSP) are designed to describe two salient directed graphs, respectively. Experiments are carried out with a homogeneous kernel SVM classifier, on four challenging datasets KTH, ADL and UT-Interaction. Final results confirm the complementary of TSP and SSP, and our multi-cue representation TSP + SSP + BoVW can properly describe human actions with large intro-variability in real-time. Copyright (C) 2016, Chongqing University of Technology. Production and hosting by Elsevier B.V.

Projection Transform on Spatio-Temporal Context for Action Recognition

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Action Recognition Using Context and Appearance Distribution Features

3D R Transform on Spatio-temporal Interest Points for Action Recognition

Spatio-temporal Semantic Features for Human Action Recognition.

Fusing $${\mathcal {R}}$$R Features and Local Features with Context-Aware Kernels for Action Recognition

Spatio-Temporal Proximity Distribution Kernels for Action Recognition

Robust Human Action Recognition Based on Spatio-Temporal Descriptors and Motion Temporal Templates

Action recognition using hybrid spatio-temporal bag-of-features

Exploring Hybrid Spatio-Temporal Convolutional Networks for Human Action Recognition.

Extracting Hierarchical Spatial and Temporal Features for Human Action Recognition

Efficient Spatialtemporal Context Modeling for Action Recognition

Contextual Fisher Kernels for Human Action Recognition

Research on Local Spatio-Temporal Features for Action Recognition

Action recognition via restricted dense trajectories and spatio-temporal co-occurrence feature

Salient Pairwise Spatio-Temporal Interest Points for Real-Time Activity Recognition.

Action Recognition with Spatial-Temporal Representation Analysis Across Grassmannian Manifold and Euclidean Space

Attentive Action and Context Factorization

Action recognition based on space-time interest points and topic model

A Hierarchical Spatio-Temporal Model for Human Activity Recognition.

Fusing Appearance and Distribution Information of Interest Points for Action Recognition