Abstract:The recent advances in imaging devices have opened the opportunity of better solving the tasks of video content analysis and understanding. Next-generation cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multimodal videos generally facilitates the accomplishment of related applications. However, the limitations of the emerging cameras, such as short effective distances, expensive costs, or long response time, degrade their applicability, and currently make these devices not online accessible in practical use. In this paper, we provide an alternative scenario to address this problem, and illustrate it with the task of recognizing human actions. In particular, we aim at improving the accuracy of action recognition in RGB videos with the aid of one additional RGB-D camera. Since RGB-D cameras, such as Kinect, are typically not applicable in a surveillance system due to its short effective distance, we instead offline collect a database, in which not only the RGB videos but also the depth maps and the skeleton data of actions are available jointly. The proposed approach can adapt the interdatabase variations, and activate the borrowing of visual knowledge across different video modalities. Each action to be recognized in RGB representation is then augmented with the borrowed depth and skeleton features. Our approach is comprehensively evaluated on five benchmark data sets of action recognition. The promising results manifest that the borrowed information leads to remarkable boost in recognition accuracy.

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition.

A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

Arbitrary-View Human Action Recognition: A Varying-View RGB-D Action Dataset

View-invariant Human Action Recognition Via Robust Locally Adaptive Multi-View Learning

A Large Scale RGB-D Dataset for Action Recognition.

Online Robust Action Recognition Based on a Hierarchical Model

Robust action recognition via borrowing information across video modalities

View-invariant action recognition:a survey

Arbitrary-view human action recognition via novel-view action generation

RGB-D-based Action Recognition Datasets: A Survey

A Multi-viewpoint Outdoor Dataset for Human Action Recognition

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

HMDB: A large video database for human motion recognition

View-Robust Neural Networks for Unseen Human Action Recognition in Videos

Collecting Public RGB-D Datasets for Human Daily Activity Recognition

RGBD-HuDaAct: A color-depth video database for human daily activity recognition

Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data

Action Recognition In Rgb-D Egocentric Videos

CAS-YNU Multi-modal Cross-view Human Action Dataset