Mutual Learning and Feature Fusion Siamese Networks for Visual Object Tracking

Min Jiang,Yuyao Zhao,Jun Kong
DOI: https://doi.org/10.1109/tcsvt.2020.3037947
IF: 5.859
2021-08-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recently Siamese-based trackers have shown their outstanding performance in visual object tracking community. But they seldom pay attention to the inter-branch interaction as well as intra-branch feature fusion from different convolution layers. In this paper, we build up a comprehensive Siamese network which consists of a mutual learning subnetwork (M-net) and a feature fusion subnetwork (F-net), to realize object tracking. Each of them is a Siamese network with special functions. M-net is designed to help the two branches mine the dependencies from each other, thus the object template is adaptively updated to a certain extent. F-net fuses different levels of convolutional features for full usage of spatial and semantic information. We also design a global-local channel attention (GLCA) module in F-net to capture the channel dependencies for a proper feature fusion. Our method takes ResNet as feature extractor and is trained offline in an end-to-end style. We evaluate our method in several famous benchmarks such as OTB2013, OTB2015, VOT2015, VOT2016, NFS and TC128. Extensive experimental results demonstrate our method achieves competitive results while maintaining a considerable real-time speed.
engineering, electrical & electronic
What problem does this paper attempt to address?