Abstract:Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is to improve the efficiency and performance of deep trackers in visual tracking tasks. Specifically, the authors focus on how to selectively extract useful feature channels, thereby reducing computational and memory costs while maintaining or even improving tracking accuracy. ### Problem Background Traditional deep - learning trackers usually use pre - trained deep networks to represent various objects. These networks were initially trained for large - scale image classification tasks. Although these networks can represent general objects well, they are too complex, resulting in poor performance in tracking specific moving objects and bringing high computational and memory overheads. ### Paper Solution To solve the above problems, this paper proposes a new framework called "channel distillation". The main objectives of this framework are: 1. **Adaptive selection of useful feature channels**: By selecting the feature channels that are most useful for specific tracking objects and removing noisy channels, the tracking effect is improved. 2. **Compression of feature dimensions**: By reducing the number of unnecessary channels, the computational and memory requirements are reduced. 3. **Optimization of model update**: Integrate feature compression, response map generation and model update into a unified energy minimization problem to achieve efficient online tracking. ### Implementation Method To verify the effectiveness of channel distillation, the authors selected two typical deep - tracking algorithms - discriminative correlation filter (DCF) and ECO as examples. They integrated channel distillation into these algorithms to form a joint optimization problem, as shown in formula (2): \[ E(h, a)=\sum_{i = 1}^{n}\left\|\sum_{l = 1}^{d}\alpha_{l}(f^{(l)}(x_{i})\otimes h^{(l)})-y_{i}\right\|^{2}+\lambda\left\|a\right\|\sum_{l = 1}^{d}\alpha_{l}\left\|h^{(l)}\right\|^{2},\quad\text{s.t.}\ \alpha_{l}\in\{0, 1\} \] where: - \(f^{(l)}(x_{i})\) represents the feature of the \(l\) - th channel; - \(h^{(l)}\) is a multi - channel correlation filter; - \(y_{i}\) is the ideal response map; - \(\alpha_{l}\) is a binary variable indicating whether the \(l\) - th channel is selected; - \(\lambda\) is a regularization parameter. ### Experimental Results Through extensive experimental evaluations on multiple video benchmark datasets, the authors demonstrated the effectiveness and generalization ability of the channel distillation framework. The results show that by selectively using good feature channels, the tracker not only improves accuracy but also significantly reduces computational and memory overheads. ### Main Contributions 1. **Revealed the existence of specific optimal channels in multi - channel features**: Different tracking objects have different optimal channel combinations. 2. **Proposed the channel distillation framework**: Through the energy minimization problem, the optimal channels are adaptively selected, thereby improving tracking accuracy, speed and memory efficiency. 3. **Comprehensive evaluation and analysis**: Demonstrated the effectiveness and generalization ability of channel distillation in practical applications, providing new ideas for developing efficient deep trackers. In conclusion, this paper effectively solves the redundancy and high - cost problems of deep - learning trackers in visual tracking tasks by introducing the channel distillation framework, significantly improving tracking performance.

Distilling Channels for Efficient Deep Tracking

High-speed Tracking with Multi-Templates Correlation Filters

Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters

Robust Visual Object Tracking Based on Feature Channel Weighting and Game Theory

Real-Time Correlation Tracking via Joint Model Compression and Transfer

Distillation, Ensemble and Selection for building a Better and Faster Siamese based Tracker

Multi-Channel Feature Dimension Adaption for Correlation Tracking

Target-Aware Deep Tracking

Efficient thermal infrared tracking with cross-modal compress distillation

Deep Correlation Filter Tracking With Shepherded Instance-Aware Proposals

DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking

SCSTCF: Spatial-Channel Selection and Temporal Regularized Correlation Filters for visual tracking

Robust and real-time deep tracking via multi-scale domain adaptation

Attention fusion and target-uncertain detection for discriminative tracking

An Adaptive Feature Channel Weighting Scheme For Correlation Tracking

A novel kernelized correlation filter by fusing multiple feature response maps, enhanced target re-detection, and improved model updating for visual tracking

Real-time tracking based on deep feature fusion

Dual Deep Network for Visual Tracking

Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking

Improved C-COT based on feature channels confidence for visual tracking

Multi-cue Correlation Filters for Robust Visual Tracking