Tracking-Assisted Object Detection with Event Cameras

Ting-Kang Yen,Igor Morawski,Shusil Dangi,Kai He,Chung-Yi Lin,Jia-Fong Yeh,Hung-Ting Su,Winston Hsu
2024-09-18
Abstract:Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various implicit-learned memories to retain as many temporal cues as possible. However, implicit memories still struggle to preserve long-term features effectively. In this paper, we consider those invisible objects as pseudo-occluded objects and aim to detect them by tracking through occlusions. Firstly, we introduce the visibility attribute of objects and contribute an auto-labeling algorithm to not only clean the existing event camera dataset but also append additional visibility labels to it. Secondly, we exploit tracking strategies for pseudo-occluded objects to maintain their permanence and retain their bounding boxes, even when features have not been available for a very long time. These strategies can be treated as an explicit-learned memory guided by the tracking objective to record the displacements of objects across frames. Lastly, we propose a spatio-temporal feature aggregation module to enrich the latent features and a consistency loss to increase the robustness of the overall pipeline. We conduct comprehensive experiments to verify our method's effectiveness where still objects are retained, but real occluded objects are discarded. The results demonstrate that (1) the additional visibility labels can assist in supervised training, and (2) our method outperforms state-of-the-art approaches with a significant improvement of 7.9% absolute mAP.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address a key issue encountered by event cameras in the process of object detection—long-range dependency, where objects become invisible when there is no relative motion between the object and the camera. Specifically, the paper proposes the following points: 1. **Detection of Pseudo-Occluded Objects**: - The paper refers to these objects, which cannot be detected due to the lack of relative motion, as "pseudo-occluded objects" and attempts to detect them through tracking methods. 2. **Visibility Attributes and Automatic Labeling Algorithm**: - An automatic labeling algorithm is proposed, which not only cleans the existing event camera datasets but also adds additional visibility labels to them. This helps to distinguish between moving and stationary objects and supervises the training process. 3. **Application of Tracking Strategies**: - Tracking strategies are utilized to maintain the persistence of pseudo-occluded objects, retaining their bounding boxes even when features are unavailable for extended periods. These strategies can be seen as explicit memory mechanisms guided by tracking targets. 4. **Spatiotemporal Feature Aggregation Module**: - A spatiotemporal feature aggregation module is proposed to enrich latent features, and consistency loss is introduced to enhance the robustness of the overall pipeline. Through the above methods, the paper addresses the issue of traditional implicit memory methods struggling to effectively maintain long-term features and demonstrates significant performance improvements in experiments (an absolute mAP increase of 7.9%).