Novel Pipeline Integrating Cross-Modality and Motion Model for Nearshore Multi-Object Tracking in Optical Video Surveillance

Jiangang Ding,Wei Li,Lili Pei,Ming Yang,Aojia Tian,Bo Yuan
DOI: https://doi.org/10.1109/tits.2024.3373370
IF: 8.5
2024-01-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:Nearshore multi-object tracking (NMOT) aims to locat and identify nearshore objects. Most approaches accomplish this task using radar and remote-sensing technologies. In contrast, video data can describe the visual appearance of nearshore objects without prior information, such as identity, location, or movement. In this study, we introduce a cross-modality pipeline to address the four major challenges of NMOT. First, we propose introducing a cross-modality bi-attention transformer (CBT) manage the information interaction between RGB and thermal infrared videos effectively. This decoupling and guidance mechanism laid the foundation for our subsequent processes. Next, we integrate the outputs of the backbone with historical frames to extract crucial temporal features. Subsequently, we refine small object detection performance by employing multi-scale feature alignment (MFA). Observations are generated by the transformer decoder. To tackle challenges arising from extensive occlusion and interactions induced by waves in NMOT, we propose guiding modulation (GM), supplemented by low-confidence boxes and multi-point corner momentum (MCM) to facilitate association. Our approach is simple, online, and real-time, showcasing outstanding performance in benchmark evaluations. The open-source implementation of our work is available at https://github.com/Ding-JianGang/Cross-Modality-MOT-in-Nearshore-Environments.
What problem does this paper attempt to address?