Wasserstein Contrastive Representation Distillation: Supplementary Material

Liqun Chen,Dong Wang,Zhe Gan,Jingjing Liu,Ricardo Henao,L. Carin
2021-01-01
Abstract:Algorithm 1 The proposed WCoRD Algorithm. 1: Input: A mini-batch of data samples {xi, yi}i=1. 2: Extract features h and h from the teacher and student networks, respectively. 3: Construct a memory buffer B to store previous computed features. 4: Global contrastive knowledge transfer: 5: Max. the GCKT loss in Eqn. (11) over θS and φ. 6: Local contrastive knowledge transfer: 7: Min. the LCKT loss in Eqn. (13) over θS . 8: Min. the task-specific loss over θS .
What problem does this paper attempt to address?