Abstract:Object detection and semantic segmentation are the basic tasks of computer vision. Recently, the combination of object detection and semantic segmentation has made great progress. With the box-level weakly supervised semantic segmentation(WSSS) method, we predict segmentation based on feature maps extracted from object detector. Existing methods require both box-level and pixel-level annotations to train the shared backbone network simultaneously to get the bounding boxes and segmentation. However, in the absence of pixel-level annotations and without changing the parameters of network framework, object detectors can’t predict semantic segmentation. We design a compact and plug-and-play object detection to semantic segmentation(O2S) module to enable object detectors to predict semantic masks, making full utilization of the training set and intermediate feature maps of object detection. We also propose a box-level weakly supervised probabilistic gap adaptive(PGA) method, which enables O2S to learn semantic masks from the training set of object detection. We evaluate the proposed approach on Pascal VOC 2007 and Pascal VOC 2012 and show its feasibility. With only 3.5 million parameters, the results of O2S trained with PGA are very close to the results of the whole networks trained with the WSSS methods. Our work has important implications for exploring the commonality of multiple visual tasks.

Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection