Sparse and Hierarchical Masked Modeling for Convolutional Representation Learning

Keyu Tian,Yi Jiang,qishuai diao,Chen Lin,Liwei Wang,Zehuan Yuan
2023-01-01
Abstract:This paper presents a simple yet powerful framework to pre-train convolutional network (convnet) with Sparse masKed modeling. SparK addresses key challenges in applying transformer-specialized masked modeling to convolutional models: (i) convolution operation cannot handle irregular, random-masked input; (ii) the single-scale nature of existing masked modeling is inconsistent with convnet's hierarchical structure. For (i), we sparsely gather the unmasked pixels to a sparse image and use sparse convolution for encoding. For the later, we develop a hierarchical encoder-decoder to reconstruct from multi-scale encoded features to fully exploit the advantage of hierarchy. As the first hierarchical masked modeling method designed for convnets, SparK exploits their untapped potential. On three downstream tasks, SparK surpasses both state-of-the-art contrastive learning and \textit{transformer-based} masked modeling by similarly large margins (around +1.0%). Improvements on object detection and instance segmentation are more substantial (>1.0%), verifying strong transferability of features learned by SparK. We also demonstrate SparK's favorable scaling behavior by observing more gains on larger models. Taken all results together, a promising future of generative pre-training on convnets has been initially shown by SparK. Codes will be made publicly available.
What problem does this paper attempt to address?