Self-Supervised Learning With Data-Efficient Supervised Fine-Tuning for Crowd Counting

Rui Wang,Yixue Hao,Long Hu,Jincai Chen,Min Chen,Di Wu
DOI: https://doi.org/10.1109/tmm.2023.3251106
IF: 7.3
2023-05-09
IEEE Transactions on Multimedia
Abstract:Due to the expensive and laborious annotations of labeled data required by fully-supervised learning in the crowd counting task, it is desirable to explore a method to reduce the labeling burden. There exists a large number of unlabeled images in the wild that can be easily obtained compared to labeled datasets. Based on the characteristics of consistent spatial transformation with the annotations of heads and image, this paper proposes a self-supervised learning framework with unlabeled and limited labeled data for pre-training and fine-tuning crowd counting model (SSL-FT). It includes an online network and a target network that receive the same images but are randomly processed by two defined augmentation transformations. We leverage unlabeled data to pre-train the online network based on a self-supervised loss and small-scale labeled data to transfer the model to a specific domain based on a fully-supervised loss. We demonstrate the effectiveness of the SSL-FT on four public datasets including ShanghaiTech PartA, PartB, UCF-QNRF and WorldExpo'10 utilizing a classical counting model. Experimental results show that our approach performs better than state-of-art semi-supervised methods.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?