Reparameterized dilated architecture: A wider field of view for pedestrian detection
Lixiong Gong,Xiao Huang,Jialin Chen,Miaoling Xiao,Yinkang Chao
DOI: https://doi.org/10.1007/s10489-023-05255-3
IF: 5.3
2024-01-11
Applied Intelligence
Abstract:With the continuous advancements in the field of computer vision, the performance of state-of-the-art (SOTA) methods in pedestrian detection has reached new heights. Despite this progress, challenges persist in constructing global information dependencies and context awareness due to limited receptive fields in most detectors. These constraints particularly affect edge and small pedestrian target detection. Our proposed solution, reparameterized dilated convolution (RDConv), strategically employs sawtooth dilation rates to broaden the receptive field without increasing computational costs. RDConv maintains the same cost as small convolutional kernels but offers a larger receptive field, enabling comprehensive modeling of the relationship between pedestrians and their environment, enhancing context awareness. To address the need for pedestrian information dependencies crucial for edge and small-target detection, we introduce the group multihead self-attention (G-MSA) mechanism. Overcoming high computational costs and limited interaction issues in traditional self-attention schemes, we adopt deep separation and supplementary boundary feature computation. RDConv and G-MSA are integrated into a multibranch framework to assess information flow interactions. To address the diverse requirements of activation functions for convolution and self-attention mechanisms, we propose the dynamic boundary (DB) activation function. It can adaptively adjust the nonlinearity and gradient of information from each layer in the network, accommodating the integrated structure of the two merging methods. Applied to YOLOv5s and tested on City Persons, Caltech Pedestrian, and PASCAL VOC datasets, our approach achieves significant metrics of 33.61 AP 0.5 , 61.41 AP 0.5 , and 92.08 mAP (YOLOv5m). Results across three datasets strongly affirm the effectiveness of our method.
computer science, artificial intelligence