Learning to Segment Video Object with Accurate Boundaries.

Jingchun Cheng,Yuhui Yuan,Yali Li,Jingdong Wang,Shengjin Wang
DOI: https://doi.org/10.1109/tmm.2020.3020698
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Video object segmentation has attracted considerable research interest these years. Top-performing video object segmentation methods mainly rely on fully convolutional neural networks which are specifically trained for predicting high-performance masks, resulting in a lack of preciseness in boundary details. This paper tackles the problem of predicting both mask-accurate and boundary-precise segmentation masks in videos. To solve this problem, we propose a simple and efficient network structure: the Mask-boundAry-Consistent Network (MAC-Net). The MAC-Net is an end-to-end fully convolutional network, where both mask and boundaries are jointly optimized during training, enabling it to predict masks along with accurate boundaries. An inner-net boundary-computing module is incorporated in the MAC-Net for producing spontaneously mask-consistent boundaries. We analyze the influence of parameter settings, network constructions of the MAC-Net, and compare with state-of-the-art algorithms on three widely-adopted datasets. Experimental results show that the MAC-Net achieves state-of-the-art performance, demonstrating the effectiveness of its mask-boundary-consistent network structure. We also propose that the boundary module in MAC-Net has high compatibility, and can be easily adapted to other segmentation-related techniques.
What problem does this paper attempt to address?