Abstract:The rapid development of remote sensing technology has provided new sources of data for marine rescue and has made it possible to find and track survivors. Due to the requirement of tracking multiple survivors at the same time, multi-object tracking (MOT) has become the key subtask of marine rescue. However, there exists a significant gap between fine-grained objects in realistic marine rescue remote sensing data and the fine-grained object tracking capability of existing MOT technologies, which mainly focuses on coarse-grained object scenarios and fails to track fine-grained instances. Such a gap limits the practical application of MOT in realistic marine rescue remote sensing data, especially when rescue forces are limited. Given the promising fine-grained classification performance of recent text-guided methods, we delve into leveraging labels and attributes to narrow the gap between MOT and fine-grained maritime rescue. We propose a text-guided multi-class multi-object tracking (TG-MCMOT) method. To handle the problem raised by fine-grained classes, we design a multi-modal encoder by aligning external textual information with visual inputs. We use decoding information at different levels, simultaneously predicting the category, location, and identity embedding features of objects. Meanwhile, to improve the performance of small object detection, we also develop a data augmentation pipeline to generate pseudo-near-infrared images based on RGB images. Extensive experiments demonstrate that our TG-MCMOT not only performs well on typical metrics in the maritime rescue task (SeaDronesSee dataset), but it also effectively tracks open-set categories on the BURST dataset. Specifically, on the SeaDronesSee dataset, the Higher Order Tracking Accuracy (HOTA) reached a score of 58.8, and on the BURST test dataset, the HOTA score for the unknown class improved by 16.07 points.

Text-Guided Multi-Modal Fusion for Underwater Visual Tracking

A Robust Underwater Multiclass Fish-School Tracking Algorithm

Closed-Loop Tracking-by-Detection for ROV-Based Multiple Fish Tracking

Embedded Online Fish Detection and Tracking System Via YOLOv3 and Parallel Correlation Filter

Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

Underwater Long-Term Object Tracker for Marine Organism Capture

Semi-supervised Visual Tracking of Marine Animals Using Autonomous Underwater Vehicles

Real-time Visual Object Tracking with Natural Language Description

A Novel AMSS-FFN for Underwater Multisource Localization Using Artificial Lateral Line.

More Perspectives Mean Better: Underwater Target Recognition and Localization with Multimodal Data via Symbiotic Transformer and Multiview Regression

Text-Guided Multi-Class Multi-Object Tracking for Fine-Grained Maritime Rescue

FishTrack23: An Ensemble Underwater Dataset for Multi-Object Tracking

Joint Visual Grounding and Tracking with Natural Language Specification

Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Multi-Object Tracking by Iteratively Associating Detections with Uniform Appearance for Trawl-Based Fishing Bycatch Monitoring

A Fusion Algorithm of Object Detection and Tracking for Unmanned Surface Vehicles

Underwater Object Tracker: UOSTrack for Marine Organism Grasping of Underwater Vehicles

Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments

Multi-Granularity Language-Guided Multi-Object Tracking

Unifying Visual and Vision-Language Tracking via Contrastive Learning