Pedestrian detection with high-resolution event camera

Piotr Wzorek,Tomasz Kryjak
DOI: https://doi.org/10.34658/9788366741928.7
2023-05-29
Abstract:Despite the dynamic development of computer vision algorithms, the implementation of perception and control systems for autonomous vehicles such as drones and self-driving cars still poses many challenges. A video stream captured by traditional cameras is often prone to problems such as motion blur or degraded image quality due to challenging lighting conditions. In addition, the frame rate - typically 30 or 60 frames per second - can be a limiting factor in certain scenarios. Event cameras (DVS -- Dynamic Vision Sensor) are a potentially interesting technology to address the above mentioned problems. In this paper, we compare two methods of processing event data by means of deep learning for the task of pedestrian detection. We used a representation in the form of video frames, convolutional neural networks and asynchronous sparse convolutional neural networks. The results obtained illustrate the potential of event cameras and allow the evaluation of the accuracy and efficiency of the methods used for high-resolution (1280 x 720 pixels) footage.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper primarily explores the issue of pedestrian detection using high-resolution event cameras. Traditional cameras may encounter problems such as motion blur or image quality degradation under fast motion or poor lighting conditions. As an emerging technology, event cameras can better address these challenges. The paper compares two deep learning-based methods to process event data for pedestrian detection tasks: 1. **Method based on video frame representation**: This method accumulates event data within a defined time window to form a data structure similar to traditional video frames and inputs it into Convolutional Neural Networks (CNNs). The study employs the YOLOv7 architecture and combines different event features (such as polarity, temporal resolution, and frequency) to maximize the amount of information. This method achieves good detection accuracy (67.7% mAP@0.5, 38% mAP@.5:.95) but has high computational complexity (104.7 GFLOPs). 2. **Method based on Asynchronous Sparse Convolutional Neural Networks (ASCNNs)**: This method leverages the sparse nature of event data, updating only the convolution results corresponding to the changing input values to reduce computational complexity and energy consumption. Although this method theoretically reduces computational demands (205 MFLOPs), it did not achieve satisfactory accuracy levels in experiments. In summary, this study aims to evaluate the accuracy and efficiency of different methods in processing high-resolution event camera data and to provide direction for the future development of more efficient and accurate pedestrian detection systems.