Distilling Channels for Efficient Deep Tracking

Shiming Ge,Zhao Luo,Chunhui Zhang,Yingying Hua,Dacheng Tao
2024-09-18
Abstract:Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to improve the efficiency and performance of deep trackers in visual tracking tasks. Specifically, the authors focus on how to selectively extract useful feature channels, thereby reducing computational and memory costs while maintaining or even improving tracking accuracy. ### Problem Background Traditional deep - learning trackers usually use pre - trained deep networks to represent various objects. These networks were initially trained for large - scale image classification tasks. Although these networks can represent general objects well, they are too complex, resulting in poor performance in tracking specific moving objects and bringing high computational and memory overheads. ### Paper Solution To solve the above problems, this paper proposes a new framework called "channel distillation". The main objectives of this framework are: 1. **Adaptive selection of useful feature channels**: By selecting the feature channels that are most useful for specific tracking objects and removing noisy channels, the tracking effect is improved. 2. **Compression of feature dimensions**: By reducing the number of unnecessary channels, the computational and memory requirements are reduced. 3. **Optimization of model update**: Integrate feature compression, response map generation and model update into a unified energy minimization problem to achieve efficient online tracking. ### Implementation Method To verify the effectiveness of channel distillation, the authors selected two typical deep - tracking algorithms - discriminative correlation filter (DCF) and ECO as examples. They integrated channel distillation into these algorithms to form a joint optimization problem, as shown in formula (2): \[ E(h, a)=\sum_{i = 1}^{n}\left\|\sum_{l = 1}^{d}\alpha_{l}(f^{(l)}(x_{i})\otimes h^{(l)})-y_{i}\right\|^{2}+\lambda\left\|a\right\|\sum_{l = 1}^{d}\alpha_{l}\left\|h^{(l)}\right\|^{2},\quad\text{s.t.}\ \alpha_{l}\in\{0, 1\} \] where: - \(f^{(l)}(x_{i})\) represents the feature of the \(l\) - th channel; - \(h^{(l)}\) is a multi - channel correlation filter; - \(y_{i}\) is the ideal response map; - \(\alpha_{l}\) is a binary variable indicating whether the \(l\) - th channel is selected; - \(\lambda\) is a regularization parameter. ### Experimental Results Through extensive experimental evaluations on multiple video benchmark datasets, the authors demonstrated the effectiveness and generalization ability of the channel distillation framework. The results show that by selectively using good feature channels, the tracker not only improves accuracy but also significantly reduces computational and memory overheads. ### Main Contributions 1. **Revealed the existence of specific optimal channels in multi - channel features**: Different tracking objects have different optimal channel combinations. 2. **Proposed the channel distillation framework**: Through the energy minimization problem, the optimal channels are adaptively selected, thereby improving tracking accuracy, speed and memory efficiency. 3. **Comprehensive evaluation and analysis**: Demonstrated the effectiveness and generalization ability of channel distillation in practical applications, providing new ideas for developing efficient deep trackers. In conclusion, this paper effectively solves the redundancy and high - cost problems of deep - learning trackers in visual tracking tasks by introducing the channel distillation framework, significantly improving tracking performance.