Abstract:High resolution and strong semantic representation are both vital for feature extraction networks of pedestrian detection. The existing high-resolution network (HRNet) has presented a promising performance for pedestrian detection. However, we observed that it still has some significant shortcomings for heavily occluded and small-scale pedestrians. In this paper, we propose to address the shortcomings by extracting semantic and spatial context from HRNet. Specifically, we propose a Context-aware Feature Representation Learning Module (CFRL-Module), which combines a Multi-scale Feature Context Extraction Parallel Block for Convolution and Self-attention (CEPCA-Block) with two parallel paths and an Equivalent FFN (EFFN) Block. The core CEPCA-Block adopts a parallel design to integrate convolution and multi-head self-attention (MHSA) with low parameter computational cost, which can obtain the deep semantic context by convolution path and precise context by MHSA path. Furthermore, to overcome the inefficiency of global MHSA in high-resolution pedestrian detection, we propose a novel local window MHSA, which can significantly reduce memory consumption but barely affect the detection performance. Cascading the proposed CFRL-Module with the anchor-free detection head constitutes our Context-aware Feature Representation Learning Anchor-Free Network (CFRLA-Net). The proposed CFRLA-Net can catch a high-level understanding of the heavily occluded and small-scale pedestrian instances based on HRNet, which can effectively solve the limitation of the insufficient feature extraction ability of HRNet for the hard samples. Experimental results show that CFRLA-Net achieves state-of-the-art performance on CityPersons, Caltech, and CrowdHuman benchmarks.

Exploiting Context Based on CNN and Coding Representations for Pedestrian Co-Detection

See Extensively While Focusing on the Core Area for Pedestrian Detection.

Towards Accurate Dense Pedestrian Detection Via Occlusion-Prediction Aware Label Assignment and Hierarchical-Nms.

Improved Hough Transform by Modeling Context with Conditional Random Fields for Partially Occluded Pedestrian Detection

Pedestrian Detection by Using CNN Features with Skip Connection.

PCN: Part and Context Information for Pedestrian Detection with CNNs

Coupled Network for Robust Pedestrian Detection With Gated Multi-Layer Feature Extraction and Deformable Occlusion Handling

Learning Pixel-Level and Instance-Level Context-Aware Features for Pedestrian Detection in Crowds.

Deep Pedestrian Detection Using Contextual Information and Multi-level Features

Research on Low-Resolution Pedestrian Detection Algorithms Based on R-CNN with Targeted Pooling and Proposal

Hybrid Channel Based Pedestrian Detection

Object Codetection Based on a Higher-Order Conditional Random Field

CFRLA-Net: A Context-aware Feature Representation Learning Anchor-free Network for Pedestrian Detection

Associated Metric Coding Network for Pedestrian Detection.

Pedestrian Detection and Attribute Analysis Program Based on CNN

Count- and Similarity-Aware R-CNN for Pedestrian Detection

A Part-Aware Multi-Scale Fully Convolutional Network for Pedestrian Detection

R-SSD: Refined Single Shot Multibox Detector for Pedestrian Detection

Improving Small-Scale Pedestrian Detection Using Informed Context

Deep Convolutional Neural Networks For Pedestrian Detection With Skip Pooling

Pedestrian Detection Based on Region Proposal Fusion.