Harmonizing knowledge Transfer in Neural Network with Unified Distillation

Yaomin Huang,Zaomin Yan,Chaomin Shen,Faming Fang,Guixu Zhang
2024-09-27
Abstract:Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a unified distribution constraint at different stages of the network, ensuring the comprehensiveness and coherence of knowledge transfer. Numerous experiments were conducted to validate the effectiveness of the proposed method.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve the unification and effective transfer of different - level knowledge during the knowledge distillation process of neural networks. Specifically, existing knowledge distillation methods usually focus on a single type of knowledge (such as feature - based methods or logits - based methods), or directly mix two types of knowledge, but ignore the inconsistency between different - level knowledge. This leads to an unclear optimization objective, making it difficult for the student network to reach the optimal solution. To solve this problem, the paper proposes a new framework named Unified Knowledge Distillation (UniKD). The main contributions of UniKD include: 1. **Unified knowledge distillation**: UniKD realizes unified knowledge distillation across different network layers by fusing features at different levels into a comprehensive representation and converting it into a distribution form. This can ensure the comprehensiveness and coherence of knowledge transfer. 2. **Adaptive Feature Fusion module (AFF)**: The AFF module extracts features from intermediate layers, retains multi - scale information, and simplifies the calculation process at the same time. Through the gate mechanism, the AFF module can adaptively determine the importance of adjacent - layer features, thereby retaining key information and eliminating redundant information. 3. **Feature Distribution Prediction module (FDP)**: The FDP module estimates the distribution parameters of intermediate - layer features and transforms the distillation of feature knowledge into distribution - level constraints. In this way, consistent knowledge distillation can be achieved between the intermediate - layer and final - layer logits. 4. **Experimental verification**: The paper verifies the effectiveness of UniKD through extensive experiments on multiple datasets (such as CIFAR - 100, ImageNet, and MS - COCO). The experimental results show that UniKD performs well in different tasks and different network architectures, especially in heterogeneous architectures. In conclusion, this paper aims to solve the problem of inconsistent knowledge at different levels in existing knowledge distillation methods through the UniKD framework and achieve more efficient and coherent knowledge transfer.