Abstract:Knowledge distillation, which is a process of transferring complex knowledge learned by a heavy network, i.e., a teacher, to a lightweight network, i.e., a student, has emerged as an effective technique for compressing neural networks. To reduce the necessity of training a large teacher network, this paper leverages the recent self-knowledge distillation approach to train a student network progressively by distilling its own knowledge without a pre-trained teacher network. Far from the existing self-knowledge distillation methods, which mainly focus on still images, our proposed Teaching Yourself is a self-knowledge distillation technique that targets at videos for human action recognition. Our proposed Teaching Yourself is not only designed as an effective lightweight network but also a high generalization capability model. In our approach, the network is able to update itself using the best past model, termed the preceding model, which is then utilized to guide the training process to update the present model. Inspired by consistency training in state-of-the-art semi-supervised learning methods, we also introduce an effective augmentation strategy to increase data diversity and improve network generalization and consistent predictions for our proposed Teaching Yourself approach. Our benchmark has been conducted on both the 3D Resnet-18 and 3D ResNet-50 backbone networks and evaluated on various standard datasets such as UCF101, HMDB51, and Kinetics400 datasets. The experimental results have shown that our teaching yourself method significantly improves the action recognition performance in terms of accuracy compared to existing supervised learning and knowledge distillation methods. We also have conducted an expensive ablation study to demonstrate that our approach mitigates overconfident predictions on dark knowledge and generates more consistent predictions in input variations of the same data point. The code is available at https://github.com/vdquang1991/-elf-KD.

Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition

Learning and Distillating the Internal Relationship of Motion Features in Action Recognition.

DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark

Task-Agnostic Self-Distillation for Few-Shot Action Recognition

Cross-modality Online Distillation for Multi-View Action Recognition

SMTDKD: A Semantic-Aware Multimodal Transformer Fusion Decoupled Knowledge Distillation Method for Action Recognition

Multi-view Distillation based on Multi-modal Fusion for Few-shot Action Recognition(CLIP-$\mathrm{M^2}$DF)

MAWKDN: A Multimodal Fusion Wavelet Knowledge Distillation Approach Based on Cross-View Attention for Action Recognition

Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

Multiscale knowledge distillation with attention based fusion for robust human activity recognition

Knowledge Distillation in Video-Based Human Action Recognition: An Intuitive Approach to Efficient and Flexible Model Training

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain Few-shot Action Recognition

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

Modality Distillation with Multiple Stream Networks for Action Recognition

Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation

Enhancing Few-Shot Learning in Lightweight Models via Dual-Faceted Knowledge Distillation

A 3DCNN-Based Knowledge Distillation Framework for Human Activity Recognition

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation