Spatio-temporal Focus and Lightweight Memory Network for Continuous Object Detection with Event Camera
Xinyu Zhu,Mingfeng Yin,Qi Gao,Yuanzhi Ni,Li,Yuming Bo
DOI: https://doi.org/10.1109/jsen.2024.3459468
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:The event camera, as an innovative perception sensor, generates a high-temporal-resolution event stream, filtering out redundant visual information and offering a wide dynamic range for object detection. Nevertheless, it is non-trivial to effectively utilize the advantages of the event stream to enhance the accuracy of object detection. Initially, current event representation methods require hyperparameters tuning to achieve optimal performance for event-based object detection, which is very time-consuming. Moreover, most existing methods rely on large-scale models to enhance detection accuracy. In addition, they fail to leverage the temporal clues provided by the event stream efficiently. In this work, we propose a detection pipeline which better achieves equilibrium between model speed and accuracy. First, we design an event representation without hyperparameter tuning named Dynamic Temporal Representation (DTR), which efficiently utilizes the spatial-temporal asynchronous event stream. Then, a module named Spatial Temporal Focus (STF), capable of extracting the rich temporal information from the DTR tensor, is proposed. Besides, a memory module is implemented, leveraging rich temporal clues named Light Recurrent Convolution (LRC). Finally, experiments demonstrate that our method outperforms state-of-the-art approaches on two classic event camera datasets. Specifically, in the Gen1 Automotive Dataset and the 1Mpx Detection Dataset, mean Average Precision (mAP) increases by 0.5% and 0.4%, respectively, while inference times decrease by 34.3% and 32.5%. Additionally, the parameters of the model are reduced by 50.1%. Therefore, our method demonstrates competitive advantages in terms of accuracy, efficiency, and parameters.