SimpleMask: parameter link and efficient instance segmentation
Qunpo Liu,Zhiwei Lu,Ruxin Gao,Xuhui Bu,Naohiko Hanajima
DOI: https://doi.org/10.1007/s00371-024-03451-x
IF: 2.835
2024-05-26
The Visual Computer
Abstract:To resolve the problem that the segmentation result of the full convolutional neural network in the Mask R-CNN model is not fine enough, and that the number of loss function hyperparameters is too large, leadings to the time and resource consumption of parameter adjustment, we propose a parameter link and efficient instance segmentation model in this paper. Aiming at the problem that the Mask R-CNN model does not pay attention to sample features, the method of fusing the visual attention network in the ResNet50 backbone network is adopted to achieve self-adaptation and long-range correlation in self-attention, so that the model can precisely recognize the target location and effectively detect and segment the target. The U-Net network is introduced into the segmentation, and the image is processed by stepwise upsampling and downsampling, so that the network segmentation accuracy for the pixel mask is more accurate. Considering the parameter tuning problem of the instance segmentation task, a parameter link loss is recommended to simplify the complexity of model training parameter tuning and further enhance the detection and segmentation performance of the model. We conduct extensive experiments on three extensive baselines, i.e., MiniCOCO, Cityscapes and PASCAL VOC2012, to assess the validity of our model. The experimental findings demonstrate that (1) in the MiniCOCO dataset, a box AP of 35.1 and a mask AP of 32.0 are obtained. Compared with the most advanced mask2former algorithm, the box AP and mask AP are 1.7 and 2.2 higher, respectively. (2) The AP value on Cityscapes is 38.1. In comparison with alternative instance segmentation models, the mAP of each category has been greatly improved. (3) The generalization experiment of our model on the PASCAL VOC2012 dataset shows that the box mAP and mask mAP are 75.5 and 63.6, respectively, which are improved by 3.9 and 1.9, respectively, when contrasting with the Mask R-CNN model. Our model has significant advantages in both detection and segmentation. The code will be available at https://gitee.com/zhiweilu111/simple-mask/tree/master.
computer science, software engineering