Siamese Refine Polar Mask Prediction Network for Visual Tracking

Bin Pu,Ke Xiang,Ze’an Liu,Xuanyin Wang
DOI: https://doi.org/10.1007/s11760-023-02782-x
IF: 1.583
2024-01-01
Signal Image and Video Processing
Abstract:Visual tracking is a classical research problem and recently tracking with mask prediction has been a popular task in tracking research. Many trackers add a pixel-wise segmentation subnetwork behind the original bounding box tracker to get the target’s mask. These two-stage methods need to crop the target region after finding its location and extract deep features for segmentation redundantly. This paper proposes an anchor-free Siamese Refine Polar Mask (SiamRPM) prediction network for visual tracking, which can obtain the target’s mask directly. Similar to bounding box regression, we use polar mask regression to get the target’s convex hull mask. To further adjust the contour points, we propose to employ a cascaded refinement module. The mask contours are iteratively shifted using the offset outputs of the refinement module. Comprehensive experiments on visual tracking benchmark datasets illustrate that our SiamRPM can achieve competitive results with a real-time running speed. Our method provides an effective contour-based pipeline for the tracking and segmentation task.
What problem does this paper attempt to address?