Contrastive Representation Learning with Trainable Augmentation Channel

Masanori Koyama,Kentaro Minami,Takeru Miyato,Yarin Gal
DOI: https://doi.org/10.48550/arXiv.2111.07679
2021-11-15
Abstract:In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by formalizing a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder. We show that, with the infoMax objective based on this framework, we can learn a data-dependent distribution of augmentations to avoid the collapse of the representation.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem in Contrastive Representation Learning (CRL) that some augmentations may destroy image information, leading to collapsed representations. Specifically, the author points out that when using the CRL method, the selected data augmentation operations may unintentionally damage the key features of the image, especially when dealing with different datasets. For example, for the MNIST dataset, if the cropping augmentation is applied and this cropping does not happen to include the digit part, then this augmentation will generate invalid information, thus affecting the learning effect of the model. To solve this problem, the author proposes a new framework. By introducing a trainable augmentation channel, it dynamically adjusts the probability distribution of augmentation operations. In this framework, the augmentation operation is regarded as a random process, and its purpose is to establish a "tug - of - war" between the data corruption introduced by the augmentation operation and the information retained by the encoder. By maximizing the Mutual Information (MI) objective function \(I(X;Z)\), where \(Z\) is the representation of the augmented data \(X\), the author shows how to learn a data - dependent augmentation operation distribution to avoid representation collapse. This method not only improves the effect of representation learning but also provides a new perspective for understanding existing CRL methods (such as simCLR), that is, regarding simCLR as a special case when \(P(T|X)\) is fixed as a uniform distribution. In this way, the author not only solves the representation collapse problem that may be caused by augmentation operations but also provides a new direction for future research, especially in terms of how to select and optimize augmentation operations.