Domain adaptive crowd counting via dynamic scale aggregation network

Zhanqiang Huo,Yanan Wang,Yingxu Qiao,Jing Wang,Fen Luo
DOI: https://doi.org/10.1049/cvi2.12198
IF: 1.484
2023-04-15
IET Computer Vision
Abstract:The style transfer layer closes the gap in image appearance between the domains, and the DSA module closes the gap in cross‐domain head scale variations. Crowd counting is an important research topic in computer vision. Its goal is to estimate the people's number in an image. Researchers have dramatically improved counting accuracy in recent years by regressing density maps. However, because of the inherent domain shift, the model trained on an expensive manually labelled dataset (source domain) does not perform well on a dataset with scarce labels (target domain). For this issue, a novel dynamic scale aggregation network (DSANet) is proposed to reduce the gaps in style and cross‐domain head scale variations. Specifically, a practical style transfer layer is introduced to reduce the appearance discrepancy between the source and target domains. Then, the translated source and target domain samples are encoded by a generator consisting of the VGG16 network and the dynamic scale aggregation modules (DSA Modules) and produce corresponding density maps. The DSA module can adaptively adjust parameters according to the input features and effectively fuse multi‐scale information to overcome the cross‐domain head scale variations. Next, a discriminator judges the input density map from the source or target domain. Last, domain distributions are aligned through adversarial between the generator and the discriminator. The experiments show that our network outperforms the current state‐of‐the‐art methods and can improve the target domain's performance while maintaining the source domain's performance without significant degradation.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?