Abstract:Balancing the trade-off between accuracy and speed for obtaining higher performance without sacrificing the inference time is a challenging topic for object detection task. Knowledge distillation, which serves as a kind of model compression techniques, provides a potential and feasible way to handle above efficiency and effectiveness issue through transferring the dark knowledge from the sophisticated teacher detector to the simple student one. Despite demonstrating promising solutions to make harmonies between accuracy and speed, current knowledge distillation for object detection methods still suffer from two limitations. Firstly, most of the methods are inherited or refereed from the frameworks in image classification task, and deploy an implicit manner by imitating or constraining the features from the intermediate layers or the output predictions between the teacher and student models. While little consideration has been raised to the intrinsic relevance of the classification and localization predictions in object detection task. Besides, these methods fail to investigate the relationship between detection and distillation tasks in knowledge distillation pipeline, and they train the whole network by simply integrating losses from these two different tasks through hand-crafted designation parameters. For addressing the aforementioned issues, we propose a novel Relation Knowledge Distillation by Auxiliary Learning for Object Detection (ReAL) method in this paper. Specifically, we first design a prediction relation distillation module which makes the student model directly mimic the output predictions from the teacher one, and conduct self and mutual relation distillation losses to excavate the relation information between teacher and student models. Moreover, for better devolving into the relationship between different tasks in distillation pipeline, we introduce the auxiliary learning into knowledge distillation for object detection and develop a dynamic weight adaptation strategy. Through regarding detection task as primary task and treating distillation task as auxiliary task in auxiliary learning framework, we dynamically adjust and regularize the corresponding weights of the losses for these tasks during the training process. Experiments on MS COCO dataset are conducted using various detector combinations of teacher and student models and the results show that our proposed ReAL can achieve obvious improvement on different distillation model configurations, while performing favorably against state-of-the-arts.

Classifier-adaptation knowledge distillation framework for relation extraction and event detection with imbalanced data

KICE: A Knowledge Consolidation and Expansion Framework for Relation Extraction.

Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection

Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction

Factorized and progressive knowledge distillation for CTC-based ASR models

CORSD: Class-Oriented Relational Self Distillation

Joint data augmentation and knowledge distillation for few-shot continual relation extraction

Phased progressive learning with coupling-regulation-imbalance loss for imbalanced data classification

CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

Multiple Relations Classification using Imbalanced Predictions Adaptation

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Adaptive Teaching with Shared Classifier for Knowledge Distillation

Relation Knowledge Distillation by Auxiliary Learning for Object Detection

Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data

Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios

Learning Efficient Detector with Semi-supervised Adaptive Distillation

An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition

More than Classification: A Unified Framework for Event Temporal Relation Extraction

Relation classification via knowledge graph enhanced transformer encoder

Adaptive class augmented prototype network for few-shot relation extraction