Abstract:With the continuous advancements in the field of computer vision, the performance of state-of-the-art (SOTA) methods in pedestrian detection has reached new heights. Despite this progress, challenges persist in constructing global information dependencies and context awareness due to limited receptive fields in most detectors. These constraints particularly affect edge and small pedestrian target detection. Our proposed solution, reparameterized dilated convolution (RDConv), strategically employs sawtooth dilation rates to broaden the receptive field without increasing computational costs. RDConv maintains the same cost as small convolutional kernels but offers a larger receptive field, enabling comprehensive modeling of the relationship between pedestrians and their environment, enhancing context awareness. To address the need for pedestrian information dependencies crucial for edge and small-target detection, we introduce the group multihead self-attention (G-MSA) mechanism. Overcoming high computational costs and limited interaction issues in traditional self-attention schemes, we adopt deep separation and supplementary boundary feature computation. RDConv and G-MSA are integrated into a multibranch framework to assess information flow interactions. To address the diverse requirements of activation functions for convolution and self-attention mechanisms, we propose the dynamic boundary (DB) activation function. It can adaptively adjust the nonlinearity and gradient of information from each layer in the network, accommodating the integrated structure of the two merging methods. Applied to YOLOv5s and tested on City Persons, Caltech Pedestrian, and PASCAL VOC datasets, our approach achieves significant metrics of 33.61 AP 0.5 , 61.41 AP 0.5 , and 92.08 mAP (YOLOv5m). Results across three datasets strongly affirm the effectiveness of our method.

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Towards Accurate Dense Pedestrian Detection Via Occlusion-Prediction Aware Label Assignment and Hierarchical-Nms.

Pedestrian Detection Method Based on Improved YOLOv5s for Densely Occluded Scenarios

A dynamic label assignment strategy for one-stage detectors

Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians

Non-maximum Suppression Guided Label Assignment for Object Detection in Crowd Scenes

Mask-Guided Attention Network for Occluded Pedestrian Detection

Pedestrian Partial Occlusion Detection Method Based on Improved YOLO Network Structure

An Objective Method for Pedestrian Occlusion Level Classification

OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking

Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification

CLAHR: Cascaded Label Assignment Head for High-Resolution Small Object Detection

Sparse Label Assignment for Oriented Object Detection in Aerial Images

LapNet : Automatic Balanced Loss and Optimal Assignment for Real-Time Dense Object Detection

Occluded Pedestrian Attention Network : an Occluded Pedestrian Detector

Count- and Similarity-Aware R-CNN for Pedestrian Detection

A lightweight YOLOv5-FFM model for occlusion pedestrian detection

Crowded pedestrian detection with optimal bounding box relocation

Reparameterized dilated architecture: A wider field of view for pedestrian detection

Improving Multiple Pedestrian Tracking in Crowded Scenes with Hierarchical Association

Crosswalk Detection from Satellite Imagery for Pedestrian Network Completion