Abstract:While supervised learning approaches show great vitality and effectiveness in video object segmentation, most of them require large amounts of annotations which are expensive and time-consuming. Recently, self-supervised learning has attracted great attention by benefiting from unlabeled video sequences. However, current patch-based self-supervised video object segmentation methods only discriminate the patch from the entire image without distinguishing the object of interest from meaningless backgrounds or even occlusion. These disturbances deteriorate the extracted features and hinder the robustness of tracking when applied to real-world video sequences. In this paper, we propose a novel model named Tracker With Integration-Augmented Attention (TWIAA) to achieve both label-free and prominent performance. Specifically, we integrate both spatial and channel dimensions by introducing a feature spatial enhancement module and a two-stream channel module. With the combination of the two modules, the network can focus on exploring the discriminative object and suppressing the irrelevant part to improve the tracking robustness. Moreover, unlike other methods that calculate features separately on the search branch and template branch, the two designed modules coupled with the Siamese network compute the respective features of the search branch and the template branch jointly to augment the interdependence of the two branches. Such interdependence is injected into both spatial and channel dimensions. So that our approach establishes richer and more discriminative associations to identify the object more accurately. In addition, our method takes full advantage of cycle-consistency information in consecutive frames, which uses coherence as the learning signal to acquire object-oriented relationships. Extensive experiments and ablation studies are conducted on large VOS benchmarks, including DAVIS-2017, YouTube-VOS-2018, and YouTube-VOS-2019. The results verify that our proposed framework has both strong feature representation and competitive performance compared with supervised and self-supervised models.

SENSE: Hyperspectral Video Object Tracker via Fusing Material and Motion Cues

SiamHYPER: Learning a Hyperspectral Object Tracker from an RGB-Based Tracker

SSTtrack: A Unified Hyperspectral Video Tracking Framework via Modeling Spectral-Spatial-Temporal Conditions

An Anchor-Free Siamese Target Tracking Network for Hyperspectral Video.

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Robust Visual Tracking Via CAMShift and Structural Local Sparse Appearance Model

Learning a Deep Ensemble Network with Band Importance for Hyperspectral Object Tracking.

SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking

Object Tracking in Hyperspectral-Oriented Video with Fast Spatial-Spectral Features

Hyperspectral Video Tracker Based on Spectral Deviation Reduction and a Double Siamese Network

Material-Guided Multiview Fusion Network for Hyperspectral Object Tracking

Spectral-Spatial-Temporal Attention Network for Hyperspectral Tracking.

Material Based Object Tracking in Hyperspectral Videos: Benchmark and Algorithms

Hyperspectral Attention Network for Object Tracking

Bae-Net: A Band Attention Aware Ensemble Network For Hyperspectral Object Tracking

Self-supervised Video Object Segmentation Using Integration-Augmented Attention

HHTrack: Hyperspectral Object Tracking Using Hybrid Attention

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers

Hyperspectral Video Target Tracking based on Pixel-wise Spectral Matching Reduction and Deep Spectral Cascading Texture Features

Hy-Tracker: A Novel Framework for Enhancing Efficiency and Accuracy of Object Tracking in Hyperspectral Videos

Fusing Multimodal Video Data for Detecting Moving Objects/Targets in Challenging Indoor and Outdoor Scenes