CLRNet: A Cross Locality Relation Network for Crowd Counting in Videos
Li Dong,Haijun Zhang,Jianghong Ma,Xiaofei Xu,Yimin Yang,Q. M. Jonathan Wu
DOI: https://doi.org/10.1109/tnnls.2022.3209918
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:In this article, we propose a new cross locality relation network (CLRNet) to generate high-quality crowd density maps for crowd counting in videos. Specifically, a cross locality relation module (CLRM) is proposed to enhance feature representations by modeling local dependencies of pixels between adjacent frames with an adapted local self-attention mechanism. First, different from the existing methods which measure similarity between pixels by dot product, a new adaptive cosine similarity is advanced to measure the relationship between two positions. Second, the traditional self-attention modules usually integrate the reconstructed features with the same weights for all the positions. However, crowd movement and background changes in a video sequence are uneven in real-life applications. As a consequence, it is inappropriate to treat all the positions in reconstructed features equally. To address this issue, a scene consistency attention map (SCAM) is developed to make CLRM pay more attention to the positions with strong correlations in adjacent frames. Furthermore, CLRM is incorporated into the network in a coarse-to-fine way to further enhance the representational capability of features. Experimental results demonstrate the effectiveness of our proposed CLRNet in comparison to the state-of-the-art methods on four public video datasets. The codes are available at: https://github.com/Amelie01/CLRNet.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture