Abstract:Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and processed videos, but it can lead to a significant increase in the computational cost during the inference phase in the task of video classification. To address these challenges, we propose a novel teacher-student video classification framework, named Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD). This framework enables the model to learn from both original and enhanced video without introducing additional computational cost during inference. Specifically, DL-KDD utilizes the strategy of knowledge distillation during training. The teacher model is trained with enhanced video, and the student model is trained with both the original video and the soft target generated by the teacher model. This teacher-student framework allows the student model to predict action using only the original input video during inference. In our experiments, the proposed DL-KDD framework outperforms state-of-the-art methods on the ARID, ARID V1.5, and Dark-48 datasets. We achieve the best performance on each dataset and up to a 4.18% improvement on Dark-48, using only original video inputs, thus avoiding the use of two-stream framework or enhancement modules for inference. We further validate the effectiveness of the distillation strategy in ablative experiments. The results highlight the advantages of our knowledge distillation framework in dark human action recognition.

Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval

Collaborative spatial-temporal distillation for efficient video deraining

Dual Knowledge Distillation on Multiview Pseudo Labels for Unsupervised Person Re-Identification

Towards Better Entity Linking with Multi-View Enhanced Distillation

DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition

KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval

Learn from Unlabeled Videos for Near-duplicate Video Retrieval

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark

VideoDistill: Language-aware Vision Distillation for Video Question Answering

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

TeachText: CrossModal Generalized Distillation for Text-Video Retrieval

DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Continual Vision-Language Retrieval Via Dynamic Knowledge Rectification

Language-aware Visual Semantic Distillation for Video Question Answering

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder Via Self On-the-fly Distillation for Dense Passage Retrieval

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning

Multi-target Knowledge Distillation Via Student Self-reflection