Abstract:Background and motivation. The localization and tracking of high-speed dynamic objects find applications across various domains. For example, tracking unmanned aerial vehicles (UAVs) serves a crucial role for regulatory authorities, enabling the swift identification of illicitly operated drones. Related work. Presently, the widespread use of cameras and Radar is observed in the realm of locating and tracking moving objects [1]. An illustrative example is [2], which integrates an RGB camera with Radar for object tracking, yielding follow-up effects in AR applications. Despite their successes, these methods encounter notable latency when tracking high-speed dynamic objects. The latency often extends into the hundreds of milliseconds due to several factors: ① cameras necessitate tens of milliseconds for exposure and may be susceptible to motion blur; ② algorithms may require a sequence of frames for optimizing localization results. This extended latency becomes a critical limitation when dealing with objects moving at speeds surpassing 10m/s, as the system is left with inadequate time for a response. Consequently, current methods fall short of meeting our requirement for precise real-time tracking of dynamic objects. Our Insight. Nowadays, event camera, characterized by its asynchronous and motion-activated nature, boasting microsecond level temporal resolution, is finding application in high-frequency monitoring. This trend serves as a catalyst for our exploration into the integration of these asynchronous measurements with advanced vision techniques, aiming to enhance the tracking for high-speed objects. Challenges. Albeit inspiring, translating it into a practical system is non-trivial and encounters significant challenges: • C1: Event burst and depth obtain delay impair object localization. Event cameras demonstrate enhanced sensitivity to changes in the environment. Crucial events triggered by objects are often overshadowed by the abundance of events triggered by changes in the environment. • C2: Heterogeneous data hinders consistently high-precision tracking. The event camera exhibits a high detection frequency, but its results lack scale information. Simultaneously, the rough 3D localization of objects can be obtained from the depth camera's results, but its frequency is low. Our work. To surmount these challenges, we conceive DO-Tracker, a framework adapt at merging event and depth measurements for the precise localization and tracking of high-speed dynamic objects as illustrated in Fig. 1. To tackle (C1), we devise a clustering and bayes filter-based algorithm, specifically gear towards object detection and object's 2D location tracking. Concurrently, the object undergoes segmentation on the depth map using event camera detection results, effectively addressing scale-related challenges. To address (C2), we introduce the Graph-Instructed Collaborative Localization algorithm, which harmonizes heterogeneous observations from both cameras through joint optimization. We implement DO-Tracker using commercial event and depth cameras and conduct preliminary experiments of tracking a ball, confirming the viability of the approach (Fig. 2).

FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

Event-Based Fusion for Motion Deblurring with Cross-modal Attention

Intensity/Inertial Integration-Aided Feature Tracking on Event Cameras

MEFNet: Multi-scale Event Fusion Network for Motion Deblurring

Towards Robust Keypoint Detection and Tracking: A Fusion Approach with Event-Aligned Image Features

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

Event-based Motion Deblurring via Multi-Temporal Granularity Fusion

A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions

Enhancing Robustness in Asynchronous Feature Tracking for Event Cameras Through Fusing Frame Steams

Learning for Motion Deblurring with Hybrid Frames and Events

An Event-Driven Asynchronous Feature Tracking Method

Standard and Event Cameras Fusion for Feature Tracking

Learning to Deblur and Generate High Frame Rate Video with an Event Camera

Motion Aware Event Representation-Driven Image Deblurring

A Residual Learning Approach to Deblur and Generate High Frame Rate Video with an Event Camera

Object Tracking by Jointly Exploiting Frame and Event Domain

Poster: Fusing Event and Depth Sensing for Dynamic Objects Localization and Tracking.

Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Data-driven Feature Tracking for Event Cameras

Bringing Events into Video Deblurring with Non-consecutively Blurry Frames

CrossEI: Boosting Motion-Oriented Object Tracking With an Event Camera