Abstract:To resolve the problem that the segmentation result of the full convolutional neural network in the Mask R-CNN model is not fine enough, and that the number of loss function hyperparameters is too large, leadings to the time and resource consumption of parameter adjustment, we propose a parameter link and efficient instance segmentation model in this paper. Aiming at the problem that the Mask R-CNN model does not pay attention to sample features, the method of fusing the visual attention network in the ResNet50 backbone network is adopted to achieve self-adaptation and long-range correlation in self-attention, so that the model can precisely recognize the target location and effectively detect and segment the target. The U-Net network is introduced into the segmentation, and the image is processed by stepwise upsampling and downsampling, so that the network segmentation accuracy for the pixel mask is more accurate. Considering the parameter tuning problem of the instance segmentation task, a parameter link loss is recommended to simplify the complexity of model training parameter tuning and further enhance the detection and segmentation performance of the model. We conduct extensive experiments on three extensive baselines, i.e., MiniCOCO, Cityscapes and PASCAL VOC2012, to assess the validity of our model. The experimental findings demonstrate that (1) in the MiniCOCO dataset, a box AP of 35.1 and a mask AP of 32.0 are obtained. Compared with the most advanced mask2former algorithm, the box AP and mask AP are 1.7 and 2.2 higher, respectively. (2) The AP value on Cityscapes is 38.1. In comparison with alternative instance segmentation models, the mAP of each category has been greatly improved. (3) The generalization experiment of our model on the PASCAL VOC2012 dataset shows that the box mAP and mask mAP are 75.5 and 63.6, respectively, which are improved by 3.9 and 1.9, respectively, when contrasting with the Mask R-CNN model. Our model has significant advantages in both detection and segmentation. The code will be available at https://gitee.com/zhiweilu111/simple-mask/tree/master.

SATMask: Spatial Attention Transform Mask for Dense Instance Segmentation.

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation

ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention

CenterMask: Real-Time Anchor-Free Instance Segmentation

Masked-attention Mask Transformer for Universal Image Segmentation

Mask Transfiner for High-Quality Instance Segmentation

Supervised Edge Attention Network for Accurate Image Instance Segmentation

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

DB-BlendMask: Decomposed Attention and Balanced BlendMask for Instance Segmentation of High-Resolution Remote Sensing Images

Maskformer with Improved Encoder-Decoder Module for Semantic Segmentation of Fine-Resolution Remote Sensing Images.

SimpleMask: parameter link and efficient instance segmentation

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Mask-Pyramid Network: A Novel Panoptic Segmentation Method

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

CASNet: Common Attribute Support Network for image instance and panoptic segmentation

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Mask-R-FCN: A Deep Fusion Network for Semantic Segmentation.

PolarMask: Single Shot Instance Segmentation With Polar Representation