Poster: Fusing Event and Depth Sensing for Dynamic Objects Localization and Tracking.
Haoyang Wang,Xinyu Luo,Ciyu Ruan,Xuecheng Chen,Wenhua Ding,Yuxuan Liu,Xinlei Chen
DOI: https://doi.org/10.1145/3638550.3643620
2024-01-01
Abstract:Background and motivation. The localization and tracking of high-speed dynamic objects find applications across various domains. For example, tracking unmanned aerial vehicles (UAVs) serves a crucial role for regulatory authorities, enabling the swift identification of illicitly operated drones. Related work. Presently, the widespread use of cameras and Radar is observed in the realm of locating and tracking moving objects [1]. An illustrative example is [2], which integrates an RGB camera with Radar for object tracking, yielding follow-up effects in AR applications. Despite their successes, these methods encounter notable latency when tracking high-speed dynamic objects. The latency often extends into the hundreds of milliseconds due to several factors: ① cameras necessitate tens of milliseconds for exposure and may be susceptible to motion blur; ② algorithms may require a sequence of frames for optimizing localization results. This extended latency becomes a critical limitation when dealing with objects moving at speeds surpassing 10m/s, as the system is left with inadequate time for a response. Consequently, current methods fall short of meeting our requirement for precise real-time tracking of dynamic objects. Our Insight. Nowadays, event camera, characterized by its asynchronous and motion-activated nature, boasting microsecond level temporal resolution, is finding application in high-frequency monitoring. This trend serves as a catalyst for our exploration into the integration of these asynchronous measurements with advanced vision techniques, aiming to enhance the tracking for high-speed objects. Challenges. Albeit inspiring, translating it into a practical system is non-trivial and encounters significant challenges: • C1: Event burst and depth obtain delay impair object localization. Event cameras demonstrate enhanced sensitivity to changes in the environment. Crucial events triggered by objects are often overshadowed by the abundance of events triggered by changes in the environment. • C2: Heterogeneous data hinders consistently high-precision tracking. The event camera exhibits a high detection frequency, but its results lack scale information. Simultaneously, the rough 3D localization of objects can be obtained from the depth camera's results, but its frequency is low. Our work. To surmount these challenges, we conceive DO-Tracker, a framework adapt at merging event and depth measurements for the precise localization and tracking of high-speed dynamic objects as illustrated in Fig. 1. To tackle (C1), we devise a clustering and bayes filter-based algorithm, specifically gear towards object detection and object's 2D location tracking. Concurrently, the object undergoes segmentation on the depth map using event camera detection results, effectively addressing scale-related challenges. To address (C2), we introduce the Graph-Instructed Collaborative Localization algorithm, which harmonizes heterogeneous observations from both cameras through joint optimization. We implement DO-Tracker using commercial event and depth cameras and conduct preliminary experiments of tracking a ball, confirming the viability of the approach (Fig. 2).