Inverted Residual Siamese Visual Tracking with Feature Crossing Network

Feng Zhang,Xiaoyan Qian,Lei Han,Yi Shen
DOI: https://doi.org/10.1109/access.2021.3056194
IF: 3.9
2021-01-01
IEEE Access
Abstract:Siamese networks based visual tracking has recently drawn great attention due to their superior representation and tracking accuracy. However, the backbone networks and prediction networks still cannot fully take advantage of features from modern deep networks. In this paper, we propose an inverted residual Siamese feature-crossing network (IRSiamese-FCN) which is end-to-end trained off-line with a large amount of image pairs. Specifically, the Siamese backbone networks for feature extraction consist of an inverted residual network and a feature-crossing network (FCN). The designed IR architecture is light weighted by combination of depthwise and pointwise convolutions. Moreover, non-linearities and linearities are proceeded separately in deep and narrow layers. Feature-crossing network is to perform feature-level aggregations, which makes deep and shallow layers complement each other more closely and further improves tracking accuracy. We conduct ablation studies and comparison experiments over five large benchmarks. The results demonstrate that the proposed tracker can achieve competitive performance.
What problem does this paper attempt to address?