ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

Zhiwu Qing,Ziyuan Huang,Shiwei Zhang,Mingqian Tang,Changxin Gao,Rong Jin,Marcelo H. Ang Jr,Nong Sang
DOI: https://doi.org/10.1109/tmm.2023.3244126
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:The central idea of contrastive learning is to discriminate between different instances and force different views from the same instance to share the same representation. To avoid trivial solutions, augmentation plays an important role in generating different views, among which random cropping is shown to be effective for the model to learn a generalized and robust representation. Commonly used random crop operation keeps the distribution of the difference between two views unchanged along the training process. In this work, we show that adaptively controlling the disparity between two augmented views along the training process enhances the quality of the learned representations. Specifically, we present a parametric cubic cropping operation, ParamCrop, for video contrastive learning, which automatically crops a 3D cubic by differentiable 3D affine transformations. ParamCrop is trained simultaneously with the video backbone using an adversarial objective, so that it learns to increase the contrastive loss and thus gradually reduces the shared contents between two cropped views. Experiments show that this adaptive and gradual increase in the disparity yielded by ParamCrop is beneficial to learning a strong and generalized representation for downstream tasks, which is shown to be effective on multiple contrastive learning frameworks and video backbones.
What problem does this paper attempt to address?