Enhancing Query Formulation for Universal Image Segmentation

Yipeng Qu,Joohee Kim
DOI: https://doi.org/10.3390/s24061879
IF: 3.9
2024-03-15
Sensors
Abstract:Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency and effectiveness of query optimization in general - purpose image segmentation tasks. Specifically, the paper addresses two main problems existing in existing methods during the training process: 1. **Redundant Text Generation**: Existing methods contain a large amount of redundant information when generating text lists. This information contributes limitedly to guiding object queries to recognize objects of different classes, resulting in an increase in additional parameters and computational costs. 2. **One - to - One Matching Mechanism in Contrastive Loss Calculation**: The traditional contrastive loss calculation adopts a one - to - one matching method. This method limits the ability of object queries to learn more powerful representations because each object query can only be associated with one specific class or object. To solve these problems, the paper proposes the **Efficient Query Optimizer (EQO)**. This method improves the query optimization process in the following two aspects: - **Efficient Text Generation**: EQO simplifies the text list by integrating all semantic cues into a single sentence, retaining the necessary cross - class and cross - task information, thereby significantly reducing the number of parameters and computational complexity. - **Attention - Based Contrastive Loss**: A new attention - based contrastive loss calculation method is introduced, which supports a one - to - many matching mechanism, enabling each object query to learn representations of multiple classes, thereby improving the robustness and performance of the model. The experimental results of the paper show that the proposed EQO significantly improves the performance of the model on the ADE20K dataset, especially in terms of mean Intersection over Union (mIoU), Average Precision (AP) and Panoptic Quality (PQ), with increases of 0.2%, 0.6% and 0.8% respectively. This indicates that EQO has obvious advantages in improving the efficiency and accuracy of general - purpose image segmentation tasks.