Detecting Every Object from Events

Haitian Zhang,Chang Xu,Xinya Wang,Bingde Liu,Guang Hua,Lei Yu,Wen Yang
2024-04-08
Abstract:Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In this study, we turn to a new modality enabled by the so-called event camera, featured by its sub-millisecond latency and high dynamic range, for robust CAOD. We propose Detecting Every Object in Events (DEOE), an approach tailored for achieving high-speed, class-agnostic open-world object detection in event-based vision. Built upon the fast event-based backbone: recurrent vision transformer, we jointly consider the spatial and temporal consistencies to identify potential objects. The discovered potential objects are assimilated as soft positive samples to avoid being suppressed as background. Moreover, we introduce a disentangled objectness head to separate the foreground-background classification and novel object discovery tasks, enhancing the model's generalization in localizing novel objects while maintaining a strong ability to filter out the background. Extensive experiments confirm the superiority of our proposed DEOE in comparison with three strong baseline methods that integrate the state-of-the-art event-based object detector with advancements in RGB-based CAOD. Our code is available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to address the problem of class-agnostic object detection (CAOD), especially in scenarios with high requirements for rapid detection, such as autonomous driving. Existing CAOD methods mainly rely on traditional frame-based sensors, but these sensors have drawbacks of high latency and limited dynamic range, which may pose safety hazards in practical applications. The paper proposes a new method called Detecting Each Object from Events (DEOE), which utilizes event cameras (characterized by sub-millisecond latency and high dynamic range) to achieve high-speed, class-agnostic open-world object detection. The DEOE method is built on the Recursive Visual Transformer (RVT), which identifies potential objects by considering spatial and temporal consistency and treats these discovered potential objects as soft positive samples to avoid being misclassified as background. In addition, the paper introduces separate property heads to separate foreground-background classification from the task of discovering new objects, enhancing the model's ability to locate new objects while maintaining strong background filtering capability. Experimental results show that DEOE outperforms the baseline that combines state-of-the-art event-based object detectors and RGB-based CAOD methods in multiple benchmark tests, demonstrating its superiority in speed and performance. This study extends the application of CAOD to event-based vision and provides a more robust solution for object detection under conditions of fast-moving objects and lighting variations.