Abstract:<p>Recent advanced trackers, consisting of discriminative classification component and dedicated bounding box estimation, have achieved improved performance in the visual tracking community. The most essential factor for the development is the utilization of different Convolutional Neural Networks (CNNs), which significantly improves the model capacity via offline trained deep feature representations. Though powerful deep structures emphasize more semantic appearance through high dimensional latent variables, how to achieve effective feature adaptation in the online tracking stage has not been sufficiently considered yet. To this end, we argue the necessity of exploring hierarchical and complementary appearance descriptors from different convolutional layers to achieve online tracking adaptation. Therefore, in this paper, we propose an adaptive feature fusion mechanism, which can balance the detection granularities from shallow to deep convolutional layers. To be specific, the correlation between template and instance is employed to generate adaptive weights to achieve advanced saliency and discrimination. In addition, considering temporal appearance variation, the projection matrix for the multi-channel inputs is jointly updated with the correlation classifier to further enhance the robustness. The experimental results on four recent benchmarks, <em>i.e.</em>, OTB-2015, VOT2018, LaSOT and TrackingNet, demonstrate the effectiveness and robustness of the proposed method, with superior performance compared to the state-of-the-art approaches.</p>

Exploiting multi-scale hierarchical feature representation for visual tracking

Robust Visual Tracking with Deep Feature Fusion

Hierarchical Convolutional Features for Visual Tracking

Multi-features Guided Robust Visual Tracking.

A Scale Adaptive Kernel Correlation Filter Tracker With Feature Integration

Deep Scale Feature For Visual Tracking

Hierarchical Convolutional Features Fusion for Visual Tracking

HCDC-SRCF tracker: Learning an adaptively multi-feature fuse tracker in spatial regularized correlation filters framework

Multi-hierarchical Independent Correlation Filters for Visual Tracking

Adaptive feature fusion for visual object tracking

CFNN: Correlation Filter Neural Network for Visual Object Tracking

Exploiting spatial relationships for visual tracking

Kernalised Multi-resolution Convnet for Visual Tracking

A Robust Tracking with Low-Dimensional Target-Specific Feature Extraction.

Efficient object tracking using hierarchical convolutional features model and correlation filters

Multi Feature Representation and Aggregation Network for Accurate and Robust Visual Tracking.

A Collaborative Visual Tracking Architecture for Correlation Filter and Convolutional Neural Network Learning

Distractor-aware visual tracking using hierarchical correlation filters adaptive selection

CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking

Robust and real-time deep tracking via multi-scale domain adaptation

Multi-scale receptive field neural networks for object tracking