A Recurrent YOLOv8-based framework for Event-Based Object Detection

Diego A. Silva,Kamilya Smagulova,Ahmed Elsheikh,Mohammed E. Fouda,Ahmed M. Eltawil
2024-08-10
Abstract:Object detection is crucial in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, primarily relying on data from conventional frame-based RGB sensors. However, these sensors often struggle with issues like motion blur and poor performance in challenging lighting conditions. In response to these challenges, event-based cameras have emerged as an innovative paradigm. These cameras, mimicking the human eye, demonstrate superior performance in environments with fast motion and extreme lighting conditions while consuming less power. This study introduces ReYOLOv8, an advanced object detection framework that enhances a leading frame-based detection system with spatiotemporal modeling capabilities. We implemented a low-latency, memory-efficient method for encoding event data to boost the system's performance. We also developed a novel data augmentation technique tailored to leverage the unique attributes of event data, thus improving detection accuracy. Our models outperformed all comparable approaches in the GEN1 dataset, focusing on automotive applications, achieving mean Average Precision (mAP) improvements of 5%, 2.8%, and 2.5% across nano, small, and medium scales, respectively.These enhancements were achieved while reducing the number of trainable parameters by an average of 4.43% and maintaining real-time processing speeds between 9.2ms and 15.5ms. On the PEDRo dataset, which targets robotics applications, our models showed mAP improvements ranging from 9% to 18%, with 14.5x and 3.8x smaller models and an average speed enhancement of 1.67x.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issues faced by traditional frame-based visual sensors (such as RGB cameras) under fast motion and extreme lighting conditions, such as motion blur and performance degradation. To tackle these problems, the research team proposes a new framework called "Recurrent YOLOv8" (ReYOLOv8), an advanced framework for event camera object detection. Specifically, the study addresses the following key issues: 1. **Improving Detection Performance**: By combining the capabilities of YOLOv8 (an efficient real-time object detector) with Recurrent Neural Networks (RNNs) to enhance the modeling of spatiotemporal information, thereby improving the accuracy of object detection. 2. **Optimizing Event Data Encoding**: A novel, low-latency, and memory-efficient event data encoding method called "Volume of Ternary Event Images" (VTEI) is proposed. This method effectively preserves the temporal information in the event stream and has advantages such as high sparsity, low bandwidth requirements, and high compression ratio. 3. **Enhancing Data Augmentation Techniques**: A data augmentation technique called "Random Polarity Suppression" (RPS) is introduced. This technique can randomly suppress all events of a specific polarity, helping the model learn features related to objects without being influenced by potential polarity distribution biases in the training dataset. 4. **Overall System Performance Improvement**: The above contributions are integrated into a unified framework and validated on two large real-world datasets. Experimental results show that ReYOLOv8 achieves significant improvements in mean Average Precision (mAP) compared to existing methods across different model scales, while reducing the number of trainable parameters and maintaining real-time processing speed. In summary, the goal of this study is to develop a more robust and efficient visual processing system by combining the latest deep learning technologies with innovative methods designed for event cameras, particularly in dynamic and challenging environments.