Abstract:Recognition of activities in the video is an important field in computer vision. Many successful works have been done on activity recognition and they achieved acceptable results in recent years. However, their training is completely static, meaning that all classes are taught to the system in one training step. The system is only able to recognize the equivalent classes. The main disadvantage of this type of training is that if new classes need to be taught to the system, the system must be retrained from scratch and all classes retaught to the system. This specification has many challenges, such as storing and retaining data and respending training costs. We propose an approach for training the action recognition system in video data which can teach new classes to the system without the need for previous data. We will provide an incremental learning algorithm for class recognition tasks in video data. Two different approaches are combined to prevent catastrophic forgetting in the proposed algorithm. In the proposed incremental learning algorithm, two approaches are introduced and used to maintain network information in combination. These two approaches are network sharing and network knowledge distillation. We introduce a neural network architecture for action recognition to understand and represent the video data. We propose the distillation of network knowledge at the classification and feature level, which can be divided into spatial and temporal parts at the feature level. We also suggest initializing new classifiers using previous classifiers. The proposed algorithm is evaluated on the USCF101, HMDB51, and Kinetics-400 datasets. We will consider various factors such as the amount of distillation knowledge, the number of new classes and the incremental learnings stages, and their impact on the final recognition system. Finally, we will show that the proposed algorithm can teach new classes to the recognition system without forgetting the previous classes and does not need the previous data or exemplar data.

Learning and Distillating the Internal Relationship of Motion Features in Action Recognition.

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

A simulated two-stream network via multilevel distillation of reviewed features and decoupled logits for video action recognition

Learning Discriminative Features for Fast Frame-Based Action Recognition.

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

Task-Agnostic Self-Distillation for Few-Shot Action Recognition

Temporal Distinct Representation Learning for Action Recognition

Cross-modality Online Distillation for Multi-View Action Recognition

Learning Comprehensive Motion Representation for Action Recognition

Modality Distillation with Multiple Stream Networks for Action Recognition

Human Action Recognition Based on Motion Feature and Manifold Learning

Class-Incremental Learning on Video-Based Action Recognition by Distillation of Various Knowledge

Real-Time Action Recognition with Enhanced Motion Vector CNNs

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition

Joint Feature Optimization and Fusion for Compressed Action Recognition

3-Stream Convolutional Networks for Video Action Recognition with Hybrid Motion Field

Im2Flow: Motion Hallucination from Static Images for Action Recognition

DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark

Multi-Task Learning of Generalizable Representations for Video Action Recognition

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition