MaskUno: Switch-Split Block For Enhancing Instance Segmentation

Jawad Haidar,Marc Mouawad,Imad Elhajj,Daniel Asmar
2024-07-31
Abstract:Instance segmentation is an advanced form of image segmentation which, beyond traditional segmentation, requires identifying individual instances of repeating objects in a scene. Mask R-CNN is the most common architecture for instance segmentation, and improvements to this architecture include steps such as benefiting from bounding box refinements, adding semantics, or backbone enhancements. In all the proposed variations to date, the problem of competing kernels (each class aims to maximize its own accuracy) persists when models try to synchronously learn numerous classes. In this paper, we propose mitigating this problem by replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors. We name the method MaskUno and test it on various models from the literature, which are then trained on multiple classes using the benchmark COCO dataset. An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes. MaskUno proved to enhance the mAP of instance segmentation models regardless of the number and typ
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the competing kernels problem among different classes in the instance segmentation task when the model tries to learn objects of multiple classes synchronously. Specifically, when the model attempts to maximize the accuracy of each class simultaneously, competition will occur among the kernel functions of different classes, which will affect the overall performance of the model. To solve this problem, the author proposes a method named MaskUno. By introducing the Switch - Split block to process the refined ROI (Region of Interest), classify them, and assign them to specialized mask predictors, the competition among different classes is alleviated and the accuracy of the model is improved. ### Main Contributions 1. **Proposed a modular Switch - Split block**: It can replace the multi - class prediction heads in most instance segmentation methods. 2. **Ensure that there are no competing kernel functions among different classes**: This leads to a richer representation because the training of different classes no longer needs to be balanced. 3. **Improved the accuracy of the instance segmentation model on the standard COCO dataset benchmark**. ### Method Overview The core idea of MaskUno is to introduce a Switch - Split block at the output stage of the instance segmentation model. First, use bounding box refinement, and then conduct specialized learning for each class. The specific steps are as follows: 1. **Bounding box refinement**: Use the ROI Align layer to process the output of the bounding box head. 2. **Classification and switching**: Pass the output of the classifier as input to the switch, refine the ROI according to the input classification, and then turn the switch to the corresponding class in the split block. 3. **Specialized mask prediction**: Each ROI is assigned to a specific type of mask head. ### Experimental Results The author conducted experiments on multiple models and multiple classes to verify the effectiveness of MaskUno. The experimental results show that for the high - performance model DetectoRS with 80 classes, the mAP (mean Average Precision) has increased by 2.03%. In addition, for the baseline model Mask - RCNN, the mAP has increased by 4.8%. These results indicate that MaskUno is not only suitable for specific class selection, but also has a complementary effect with tasks such as cascaded architectures or backbone enhancements. ### Future Work 1. **Further experiments**: Study the impact of splitting the bounding box regression block on mAP and compare it with the classical bounding box refinement methods. 2. **Transformer - based models**: Apply MaskUno to Transformer - based instance segmentation models, and split the bounding box regression and mask prediction branches at the same time, in the hope of reaching a new state - of - the - art level. ### Conclusion MaskUno solves the competing kernels problem among different classes in the instance segmentation task by introducing the Switch - Split block, and significantly improves the accuracy of the model. This method is not only applicable to the multi - class segmentation method based on Mask - RCNN, but also has a complementary effect with tasks such as cascading or hybrid tasks. Future work will further explore its application potential in Transformer - based models.