Abstract:Action recognition has received increasing attention from the computer vision and machine learning communities in the last decade. Although many related action recognition algorithms have been proposed, similar environments conditions are often required in the training and testing stages, which limits the application of the related technologies. In order to accelerate the generalization of action recognition, in this paper, the cross-domain action recognition problem are explored by three different kinds of aspects: 1) feature learning, hand-crafted feature and deep learning feature are extracted, respectively, and then the generalization ability of them are assessed and discussed on controlled and uncontrolled environments, respectively; 2) unsupervised cross-domain learning, since it is difficult for us to obtain the labeled samples in the target domain, thus, unsupervised cross-domain learning methods can be borrowed. In order to discuss which one is suitable for open domain action recognition problem, thus, three kind of unsupervised cross-domain learning methods are assessed on open domain action recognition dataset, respectively; 3) supervised cross-domain learning, if there are some labeled samples in the target domain, but the number of them is very limited, thus, supervised cross-domain learning method should be a good choice, but, how do we make the decision for them? Therefore, these methods are also appraised on the same dataset. Moreover, we contribute a novel multi-view and multi-modality human action recognition dataset (abbreviated as ” $MMA$ ”). It consists of 7,080 action samples from 25 action categories, including 15 single-subject actions and 10 double-subject interactive actions in three views of two different scenarios, which can be utilized to simultaneously explore single-view learning, multi-view learning, multi-modality learning, and cross-domain learning problems. We further explore the same learning problems on the MMA dataset. The extensive experimental results on two different datasets show that the deep feature learning method has much better generalization ability than the hand-crafted feature, such as improved dense trajectory if there are enough labeled samples in the training dataset to be used to fine-tune the network, and both unsupervised cross-domain learning method and supervised cross-domain learning method can improve the performance, but the latter can obtain much bigger improvement, in other words, the labeled samples in the target domain are very helpful. Finally, we also attended the open domain action recognition challenge which was held in CVPR 2017 workshop, and our supervised cross-domain learning scheme obtained the best performance in all teams.

Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition

Exploring the Cross-Domain Action Recognition Problem by Deep Feature Learning and Cross-Domain Learning

Cross-View Action Recognition over Heterogeneous Feature Spaces

Internal Transfer Learning for Improving Performance in Human Action Recognition for Small Datasets.

Transferable Feature Representation for Visible-to-Infrared Cross-Dataset Human Action Recognition

Multi-Task Learning of Generalizable Representations for Video Action Recognition

Integrating Dual-Stream Cross Fusion and Ambiguous Exclude Contrastive Learning for Enhanced Human Action Recognition.

Cross-View Action Recognition Via Dual-Codebook and Hierarchical Transfer Framework

Semi-supervised human action recognition via dual-stream cross-fusion and class-aware memory bank

Cross-view Action Recognition Via Transductive Transfer Learning

Cross-Domain Human Action Recognition

Multi-dataset Training of Transformers for Robust Action Recognition

Dynamic Video Mix-Up for Cross-Domain Action Recognition

Convolutional non-local spatial-temporal learning for multi-modality action recognition

Encoding Multi-resolution Two-Stream CNNs for Action Recognition

A Cross-Modal Learning Approach for Recognizing Human Actions

CDFi: Cross-Domain Action Recognition using WiFi Signals

Transfer subspace learning for cross-dataset facial expression recognition

Action Recognition Using Co-trained Deep Convolutional Neural Networks.

Harnessing Lab Knowledge for Real-World Action Recognition

Multi-kernel learning of deep convolutional features for action recognition