Abstract:Compared with the feature-based distillation methods, logits distillation can liberalize the requirements of consistent feature dimension between teacher and student networks, while the performance is deemed inferior in face recognition. One major challenge is that the light-weight student network has difficulty fitting the target logits due to its low model capacity, which is attributed to the significant number of identities in face recognition. Therefore, we seek to probe the target logits to extract the primary knowledge related to face identity, and discard the others, to make the distillation more achievable for the student network. Specifically, there is a tail group with near-zero values in the prediction, containing minor knowledge for distillation. To provide a clear perspective of its impact, we first partition the logits into two groups, i.e., Primary Group and Secondary Group, according to the cumulative probability of the softened prediction. Then, we reorganize the Knowledge Distillation (KD) loss of grouped logits into three parts, i.e., Primary-KD, Secondary-KD, and Binary-KD. Primary-KD refers to distilling the primary knowledge from the teacher, Secondary-KD aims to refine minor knowledge but increases the difficulty of distillation, and Binary-KD ensures the consistency of knowledge distribution between teacher and student. We experimentally found that (1) Primary-KD and Binary-KD are indispensable for KD, and (2) Secondary-KD is the culprit restricting KD at the bottleneck. Therefore, we propose a Grouped Knowledge Distillation (GKD) that retains the Primary-KD and Binary-KD but omits Secondary-KD in the ultimate KD loss calculation. Extensive experimental results on popular face recognition benchmarks demonstrate the superiority of proposed GKD over state-of-the-art methods.

Triplet Distillation for Deep Face Recognition

DCCD: Reducing Neural Network Redundancy Via Distillation

Teacher–student training and triplet loss to reduce the effect of drastic face occlusion

Cross Architecture Distillation for Face Recognition

CoupleFace: Relation Matters for Face Recognition Distillation

Triplet Knowledge Distillation

Training Deep Face Recognition for Efficient Inference by Distillation and Mutual Learning

Triplet Knowledge Distillation Networks for Model Compression.

MassFace: an efficient implementation using triplet loss for face recognition

Depth map guided triplet network for deepfake face detection

Improving Face Recognition from Hard Samples Via Distribution Distillation Loss.

DeepVisage: Making face recognition simple yet with powerful generalization skills

Low-Resolution Face Recognition via Adaptable Instance-Relation Distillation

Grouped Knowledge Distillation for Deep Face Recognition

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition

MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition

Multitask learning and CNN for application of face recognition.

Distillation of a CNN for a high accuracy mobile face recognition system

Compact Triplet Loss for Person Re-Identification in Camera Sensor Networks

Person Re-Identification with Triplet Focal Loss

Feature Map Distillation of Thin Nets for Low-Resolution Object Recognition