Towards Open-World Co-Salient Object Detection with Generative Uncertainty-aware Group Selective Exchange-Masking

Yang Wu,Shenglong Hu,Huihui Song,Kaihua Zhang,Bo Liu,Dong Liu
2023-10-16
Abstract:The traditional definition of co-salient object detection (CoSOD) task is to segment the common salient objects in a group of relevant images. This definition is based on an assumption of group consensus consistency that is not always reasonable in the open-world setting, which results in robustness issue in the model when dealing with irrelevant images in the inputting image group under the open-word scenarios. To tackle this problem, we introduce a group selective exchange-masking (GSEM) approach for enhancing the robustness of the CoSOD model. GSEM takes two groups of images as input, each containing different types of salient objects. Based on the mixed metric we designed, GSEM selects a subset of images from each group using a novel learning-based strategy, then the selected images are exchanged. To simultaneously consider the uncertainty introduced by irrelevant images and the consensus features of the remaining relevant images in the group, we designed a latent variable generator branch and CoSOD transformer branch. The former is composed of a vector quantised-variational autoencoder to generate stochastic global variables that model uncertainty. The latter is designed to capture correlation-based local features that include group consensus. Finally, the outputs of the two branches are merged and passed to a transformer-based decoder to generate robust predictions. Taking into account that there are currently no benchmark datasets specifically designed for open-world scenarios, we constructed three open-world benchmark datasets, namely OWCoSal, OWCoSOD, and OWCoCA, based on existing datasets. By breaking the group-consistency assumption, these datasets provide effective simulations of real-world scenarios and can better evaluate the robustness and practicality of models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper The paper aims to address the robustness issue in the task of co-salient object detection (CoSOD) in open-world environments. Traditionally, CoSOD is defined as segmenting common salient objects from a set of related images. This definition is based on the assumption that all images contain the same salient object. However, in open-world environments, this assumption is not always reasonable, as the input image set may contain unrelated images, leading to robustness issues when the model processes these irrelevant images. Specifically, when the test image set contains images without co-salient objects, traditional CoSOD models often generate false positive predictions. This limits the effectiveness of CoSOD models in practical applications, especially in open-world scenarios where the input images may include irrelevant ones. To overcome this issue, the authors propose a Group Selection Exchange Mask (GSEM) method based on generating uncertainty to enhance the robustness of CoSOD models. GSEM simulates real-world open-world conditions by introducing "noise images" and selects the most challenging images for exchange through a designed metric method, thereby improving the model's robustness. Additionally, the authors design a parallel feature extraction mechanism, including an LVGB branch for generating uncertainty features and a CoSOD Transformer branch for capturing intra-group consistency, to better handle uncertainty and consistency information in open-world environments. In summary, the main contributions of the paper are: 1. Designing a CoSOD model learning framework suitable for open-world scenarios using the GSEM strategy. 2. Proposing a parallel feature extraction mechanism that combines LVGB and CoSOD-TB to capture uncertainty and intra-group consistency information, respectively. 3. Reconstructing three benchmark datasets (OWCoSal, OWCoSOD, OWCoCA) to better evaluate the model's robustness and practicality in open-world scenarios. 4. Validating the effectiveness of the proposed method through extensive experiments, particularly demonstrating superior performance on open-world datasets compared to existing methods.