Abstract:Real-time Human action classification in complex scenes has applications in various domains such as visual surveillance, video retrieval and human robot interaction. While, the task is challenging due to computation efficiency, cluttered backgrounds and intro-variability among same type of actions. Spatio-temporal interest point (STIP) based methods have shown promising results to tackle human action classification in complex scenes efficiently. However, the state-of-the-art works typically utilize bag-of-visual words (BoVW) model which only focuses on the word distribution of STIPs and ignore the distinctive character of word structure. In this paper, the distribution of STIPs is organized into a salient directed graph, which reflects salient motions and can be divided into a time salient directed graph and a space salient directed graph, aiming at adding spatio-temporal discriminant to BoVW. Generally speaking, both salient directed graphs are constructed by labeled STIPs in pairs. In detail, the "directional co-occurrence" property of different labeled pairwise STIPs in same frame is utilized to represent the time saliency, and the space saliency is reflected by the "geometric relationships" between same labeled pairwise STIPs across different frames. Then, new statistical features namely the Time Salient Pairwise feature (TSP) and the Space Salient Pairwise feature (SSP) are designed to describe two salient directed graphs, respectively. Experiments are carried out with a homogeneous kernel SVM classifier, on four challenging datasets KTH, ADL and UT-Interaction. Final results confirm the complementary of TSP and SSP, and our multi-cue representation TSP + SSP + BoVW can properly describe human actions with large intro-variability in real-time. Copyright (C) 2016, Chongqing University of Technology. Production and hosting by Elsevier B.V.

Action classification by exploring directional co-occurrence of weighted stips

Learning Directional Co-Occurrence for Human Action Classification

Time-ordered Spatial-Temporal Interest Points for Human Action Classification.

Salient Pairwise Spatio-Temporal Interest Points for Real-Time Activity Recognition.

Learning spatio-temporal co-occurrence correlograms for efficient human action classification

Human Action Recognition Using Multi-Velocity STIPs and Motion Energy Orientation Histogram.

Human action classification based on sequential bag-of-words model

Recognizing human actions using a new descriptor based on spatial-temporal interest points and weighted-output classifier.

Robust 3D Action Recognition Through Sampling Local Appearances and Global Distributions.

Action Recognition Via Cumulative Histogram of Multiple Features

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

Sequential Bag-of-Words Model for Human Action Classification.

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Action disambiguation analysis using normalized google-like distance correlogram

Reassessing Hierarchical Representation for Action Recognition in Still Images

Spatio-Temporal Proximity Distribution Kernels for Action Recognition

Machine Vision-Based Human Action Recognition Using Spatio-Temporal Motion Features (STMF) with Difference Intensity Distance Group Pattern (DIDGP)

Multimodal human action recognition based on spatio-temporal action representation recognition model

Making Full Use of Spatial-Temporal Interest Points: an AdaBoost Approach for Action Recognition

Human Activity Recognition based on Dynamic Spatio-Temporal Relations

Modeling Geometric-Temporal Context with Directional Pyramid Co-Occurrence for Action Recognition