Abstract:To resolve the problem that the segmentation result of the full convolutional neural network in the Mask R-CNN model is not fine enough, and that the number of loss function hyperparameters is too large, leadings to the time and resource consumption of parameter adjustment, we propose a parameter link and efficient instance segmentation model in this paper. Aiming at the problem that the Mask R-CNN model does not pay attention to sample features, the method of fusing the visual attention network in the ResNet50 backbone network is adopted to achieve self-adaptation and long-range correlation in self-attention, so that the model can precisely recognize the target location and effectively detect and segment the target. The U-Net network is introduced into the segmentation, and the image is processed by stepwise upsampling and downsampling, so that the network segmentation accuracy for the pixel mask is more accurate. Considering the parameter tuning problem of the instance segmentation task, a parameter link loss is recommended to simplify the complexity of model training parameter tuning and further enhance the detection and segmentation performance of the model. We conduct extensive experiments on three extensive baselines, i.e., MiniCOCO, Cityscapes and PASCAL VOC2012, to assess the validity of our model. The experimental findings demonstrate that (1) in the MiniCOCO dataset, a box AP of 35.1 and a mask AP of 32.0 are obtained. Compared with the most advanced mask2former algorithm, the box AP and mask AP are 1.7 and 2.2 higher, respectively. (2) The AP value on Cityscapes is 38.1. In comparison with alternative instance segmentation models, the mAP of each category has been greatly improved. (3) The generalization experiment of our model on the PASCAL VOC2012 dataset shows that the box mAP and mask mAP are 75.5 and 63.6, respectively, which are improved by 3.9 and 1.9, respectively, when contrasting with the Mask R-CNN model. Our model has significant advantages in both detection and segmentation. The code will be available at https://gitee.com/zhiweilu111/simple-mask/tree/master.

Mask Encoding: A General Instance Mask Representation for Object Segmentation

Delving Deeper into Mask Utilization in Video Object Segmentation

A Simultaneous Object Detection and Component Segmentation Approach Based on Mask R-CNN

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

EmbedMask: Embedding Coupling for Instance Segmentation

EmbedMask: Embedding Coupling for One-stage Instance Segmentation

SimpleMask: parameter link and efficient instance segmentation

Maskformer with Improved Encoder-Decoder Module for Semantic Segmentation of Fine-Resolution Remote Sensing Images.

Mask R-Cnn With Feature Pyramid Attention For Instance Segmentation

Mask Transfiner for High-Quality Instance Segmentation

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

DynaMask: Dynamic Mask Selection for Instance Segmentation

SATMask: Spatial Attention Transform Mask for Dense Instance Segmentation.

Mask-Pyramid Network: A Novel Panoptic Segmentation Method

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

PolarMask: Single Shot Instance Segmentation With Polar Representation

Mask SSD: an Effective Single-Stage Approach to Object Instance Segmentation

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

Where are the Masks: Instance Segmentation with Image-level Supervision