FlexEvent: Event Camera Object Detection at Arbitrary Frequencies

Dongyue Lu,Lingdong Kong,Gim Hee Lee,Camille Simon Chane,Wei Tsang Ooi
2024-12-10
Abstract:Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to their microsecond-level temporal resolution and asynchronous operation. Existing event-based object detection methods, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event cameras. To address these limitations, we propose FlexEvent, a novel event camera object detection framework that enables detection at arbitrary frequencies. Our approach consists of two key components: FlexFuser, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FAL, a frequency-adaptive learning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the poor performance of existing object detection methods based on event cameras at different operating frequencies. Specifically, existing methods usually rely on a fixed - frequency paradigm and fail to fully utilize the high - time resolution and adaptability provided by event cameras, resulting in poor detection performance in rapidly changing scenes in dynamic environments. ### Specific manifestations of the problem 1. **Fixed - frequency limitation**: Most existing methods align event data with low - frequency RGB frames, adopting a fixed event - stream time interval and frame - based annotations. Although this strategy simplifies data processing, it ignores the rich temporal details in high - frequency event streams. 2. **Insufficient utilization of high - frequency data**: Since human annotations are usually synchronized with a slower frame rate, current detection models cannot fully utilize the valuable information in high - frequency event data, resulting in poor performance in dynamic environments where rapid object detection is required. 3. **Lack of flexibility**: Existing methods are difficult to adapt to different operating frequencies, and especially during the transition from low - frequency to high - frequency, the performance drops significantly. ### Solutions To solve the above problems, the paper proposes a new framework named **FlexEvent**, aiming to achieve event - camera object detection at any frequency. FlexEvent contains two key components: 1. **FlexFuser**: An adaptive event - frame fusion module that can combine high - frequency event data with the rich semantic information in RGB frames, thus maintaining high - precision detection in both fast - moving and static scenes. 2. **FAL (Frequency - Adaptive Learning)**: A frequency - adaptive learning mechanism that enhances the generalization ability of the model at different operating frequencies by generating frequency - adjusted labels. ### Main contributions - **Cross - frequency detection**: FlexEvent is the first work explicitly targeting the problem of event - camera object detection at any frequency. - **Efficient fusion**: The FlexFuser module combines the advantages of event and frame data, achieving efficient and accurate detection in dynamic environments. - **Adaptive learning**: The FAL mechanism ensures the consistent performance of the model at different motion frequencies through self - training and adaptive label generation. Through extensive experimental verification, FlexEvent significantly outperforms existing methods on multiple large - scale event - camera datasets, especially performing well in high - frequency scenes, which proves its robustness and effectiveness under extreme conditions.