Visual Object Tracking Based on Mutual Learning Between Cohort Multiscale Feature-Fusion Networks with Weighted Loss

Jiaojiao Fang,Guizhong Liu
DOI: https://doi.org/10.1109/tcsvt.2020.2994744
IF: 5.859
2020-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The deep convolutional neural network (CNN) based tracking-by-detection framework recently has become one of the most popular trackers. However, these methods are either time consuming or have greatly reduced performance. This article aims to achieve nearly identical tracking accuracy with the state-of-the-art CNN tracking-by-detection algorithm with relatively faster speed. We study the existing excellent trackers under the CNN tracking-by-detection framework and introduce the following: a multiscale feature pyramid fusion neural network based on dilated convolutions is constructed to learn a scale-invariant discriminative representation for tracking small objects, a hard-threshold weighted cross-entropy loss function is proposed to decrease the gap between object classification and tracking, and a mutual learning-based training policy is used to fuse the information from the network trained by image patches with different contextual regions to further improve the tracking performance. We conduct comprehensive experiments on visual object tracking benchmarks that validate the achievement of competitive performance of the proposed tracker with relatively faster speed both on qualitative and quantitative criteria. Additionally, the experimental results reveal that the proposed mutual learning-based training policy can accelerate the convergence speed and achieve better generalization performance.
What problem does this paper attempt to address?