Abstract:In the realm of object tracking, complex backgrounds pose significant challenges, often leading to the underperformance of existing trackers that predominantly rely on deep features from the final layer of feature extraction networks. These features, while semantically rich, are not always sufficient to distinguish targets from cluttered environments. To address this limitation, we introduce the Background-aware Siamese Network (BASNet), an innovative approach that enhances the salience of features in intricate scenes. Central to our method is the Dual-Feature Fusion (DFF) model, meticulously crafted to stabilize target representation amidst distracting backgrounds. BASNet leverages the complementary strengths of deep and shallow features, harnessing the former’s semantic depth and the latter’s precision in spatial localization. This fusion not only elevates feature utilization but also mitigates the shortcomings associated with the superficial output of shallow networks. Further refining the target’s spatial definition, our attention-focusing module (AF) plays a pivotal role. It accentuates pertinent features 1 and attenuates noise emanating from the multi-layer feature amalgamation. The module’s ingenuity lies in its cyclic attention operations, which enable each pixel to establish extensive dependencies with all others , thereby spotlighting salient features. For precise target localization, BASNet employs the SIoU loss within its regression branch, a strategic choice that contributes to the model’s accuracy. Extensive experiments demonstrate the competitive performance of our method on six datasets: OTB50, OTB100, UAV123, VOT2016, VOT2018, and VOT2019.

STPNet: A Spatial-Temporal Propagation Network for Background Subtraction

Background-aware Siamese Network Tracking Based on Salient Feature Fusion

Foreground Gating and Background Refining Network for Surveillance Object Detection

Background Subtraction Via 3D Convolutional Neural Networks

Background subtraction for video sequence using deep neural network

Spatial-temporal Nonparametric Background Subtraction in Dynamic Scenes

Background Subtraction Based on Deep Convolutional Neural Networks Features

Background Subtraction Based on Modified Pulse Coupled Neural Network in Compressive Domain

Background Subtraction Based on Deep Pixel Distribution Learning

Multiscale Cascaded Scene-Specific Convolutional Neural Networks for Background Subtraction.

Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation

Background Subtraction Using Spatio-Temporal Group Sparsity Recovery.

Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models

A Universal Foreground Segmentation Technique using Deep-Neural Network

Refinement of Background-Subtraction Methods Based on Convolutional Neural Network Features for Dynamic Background.

Deep Convolutional Neural Networks Features for Robust Foreground Segmentation

Learning to Segment Instances in Videos with Spatial Propagation Network

Agspn: Efficient Attention-Gated Spatial Propagation Network for Depth Completion

Background Subtraction Based on GAN and Domain Adaptation for VHR Optical Remote Sensing Videos

Background subtraction in dynamic scenes with adaptive spatial fusing.

ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization