Abstract:Recognition of activities in the video is an important field in computer vision. Many successful works have been done on activity recognition and they achieved acceptable results in recent years. However, their training is completely static, meaning that all classes are taught to the system in one training step. The system is only able to recognize the equivalent classes. The main disadvantage of this type of training is that if new classes need to be taught to the system, the system must be retrained from scratch and all classes retaught to the system. This specification has many challenges, such as storing and retaining data and respending training costs. We propose an approach for training the action recognition system in video data which can teach new classes to the system without the need for previous data. We will provide an incremental learning algorithm for class recognition tasks in video data. Two different approaches are combined to prevent catastrophic forgetting in the proposed algorithm. In the proposed incremental learning algorithm, two approaches are introduced and used to maintain network information in combination. These two approaches are network sharing and network knowledge distillation. We introduce a neural network architecture for action recognition to understand and represent the video data. We propose the distillation of network knowledge at the classification and feature level, which can be divided into spatial and temporal parts at the feature level. We also suggest initializing new classifiers using previous classifiers. The proposed algorithm is evaluated on the USCF101, HMDB51, and Kinetics-400 datasets. We will consider various factors such as the amount of distillation knowledge, the number of new classes and the incremental learnings stages, and their impact on the final recognition system. Finally, we will show that the proposed algorithm can teach new classes to the recognition system without forgetting the previous classes and does not need the previous data or exemplar data.

Video Domain Incremental Learning for Human Action Recognition in Home Environments

RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Knowledge-guided Pre-Training and Fine-Tuning: Video Representation Learning for Action Recognition

Continual Learning in Human Activity Recognition: an Empirical Analysis of Regularization

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

View-invariant Human Action Recognition Via Robust Locally Adaptive Multi-View Learning

General Incremental Learning with Domain-aware Categorical Representations

Visualization As Intermediate Representations (VLAIR) for Human Activity Recognition.

Continual learning in sensor-based human activity recognition: An empirical benchmark analysis

A baseline on continual learning methods for video action recognition

Cross-domain video action recognition via adaptive gradual learning

Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey

Class-Incremental Learning for Action Recognition in Videos

Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method

Class-Incremental Learning on Video-Based Action Recognition by Distillation of Various Knowledge

Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning

Recognizing Video Activities in the Wild Via View-to-Scene Joint Learning

The Stimulatory Effect of Clonidine on Locus Coeruleus Noradrenergic Neurons through Imidazoline Receptors Is Modulated by Excitatory Amino Acids fn1

Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models