Representation Distillation for Efficient Self-Supervised Learning

Xin Liu,Yali Li,Shengjin Wang
DOI: https://doi.org/10.1109/icme57554.2024.10687639
2024-01-01
Abstract:Siamese self-supervised learning has shown significant progress recently, which relies on Siamese networks with identical encoders in the two branches. However, due to this inherent design of Siamese networks, the overall model capacity is primarily constrained by the encoder of interest, resulting in the representation bottleneck problem during pre-training. To address this limitation, we propose a new Distill Your Own Latent (DYOL) method that can perform self-supervised learning between branches with different architectures. So a larger target network can be employed to provide stronger self-supervision. We first decouple the update process of the target network from the online network to prevent shortcut learning. Then we distill the representation directly from the target network into the online network by enforcing the view consistency between networks. Extensive experiments on various downstream tasks validate the effectiveness of our method. Importantly, the results demonstrate that strong target networks are efficient self-supervised distillers, which enable small online networks to attain similar results to large target networks (parameter efficiency) and achieve superior performance with a much smaller number of pretraining epochs and samples (time and data efficiency).
What problem does this paper attempt to address?