Abstract:Pedestrian detection currently suffers from two issues in crowded scenes: occlusion and dense boundary prediction, making it still challenging in complex real-world scenarios. In recent years, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shown their superiorities in addressing these issues, where ViTs capture global feature dependency to infer occlusion parts and CNNs make accurate dense predictions by local detailed features. Nevertheless, limited by the narrow receptive field, CNNs fail to infer occlusion parts, while ViTs tend to ignore local features that are vital to distinguish different pedestrians in the crowd. Therefore, it is essential to combine the advantages of CNN and ViT for pedestrian detection. However, manually designing a specific CNN and ViT hybrid network requires enormous time and resources for trial and error. To address this issue, we propose the first Neural Architecture Search (NAS) framework specifically designed for pedestrian detection named NAS-PED, which automatically designs an appropriate CNNs and ViTs hybrid backbone for the crowded pedestrian detection task. Specifically, we formulate transformers and convolutions with various kernel sizes in the same format, which provides an unconstrained space for diverse hybrid network search. Furthermore, to search for a suitable backbone, we propose an information bottleneck based NAS objective function, which treats the process of NAS as an information extraction process, preserving relevant information and suppressing redundant information from the dense pedestrians in crowd scenes Extensive experiments on CrowdHuman, CityPersons and EuroCity Persons datasets demonstrate the effectiveness of the proposed method. Our NAS-PED obtains absolute gains of 4.0% MR $^{-2}$ and 1.9% AP over the state-of-the-art (SOTA) pedestrian detection framework on CrowdHuman datasets. For the CityPersons and EuroCity Persons datasets, the searched backbone achieves stable improvement across all three subsets, outperforming some large language-image pre-trained models. Code will be released after acceptance.

Real-Time Pedestrian Detection Using Convolutional Neural Network on Embedded Platform

Real-time Pedestrian Detection and Tracking on Customized Hardware

Towards Real-Time Object Detection on Embedded Systems.

The Implementation of CNN-Based Object Detector on ARM Embedded Platforms

Pedestrian Detection by Using CNN Features with Skip Connection.

Pedestrian Detection and Attribute Analysis Program Based on CNN

Real-Time Pedestrian Detection for Driver Assistance Systems Based on Deep Learning

Design of Real-Time Vehicle Detection Based on YOLOv4

R-SSD: Refined Single Shot Multibox Detector for Pedestrian Detection

An Embedded Implementation of CNN-based Hand Detection and Orientation Estimation Algorithm

Real-time Pedestrian Detection Via Hierarchical Convolutional Feature

Speeding Up Dilated Convolution Based Pedestrian Detection With Tensor Decomposition

An efficient pedestrian detection network on mobile GPU with millisecond scale

Pedestrian Detection based on Region of Convolution Neural Network

Deep convolutional neural networks for pedestrian detection

A Pedestrian and Vehicle Rapid Identification Model Based on Convolutional Neural Network

A Lightweight CNN-based Algorithm and Implementation on Embedded System for Real-Time Face Recognition

Robust real-time pedestrian detection on embedded devices

Real-Time Early Indoor Fire Detection and Localization on Embedded Platforms with Fully Convolutional One-Stage Object Detection

Multi-Task Network Pruning and Embedded Optimization for Real-time Deployment in ADAS

NAS-PED: Neural Architecture Search for Pedestrian Detection