Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Daan de Geus,Gijs Dubbelman
2024-06-14
Abstract:Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not linked to individual parent objects. Therefore, their learning objective is not aligned with the PPS task objective, which harms the PPS performance. To solve this, and make more accurate PPS predictions, we propose Task-Aligned Part-aware Panoptic Segmentation (TAPPS). This method uses a set of shared queries to jointly predict (a) object-level segments, and (b) the part-level segments within those same objects. As a result, TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations. With experiments, we show that TAPPS considerably outperforms methods that predict objects and parts separately, and achieves new state-of-the-art PPS results.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the key challenges in the Part - aware Panoptic Segmentation (PPS) task. The PPS task requires not only segmenting and classifying each foreground object and background area in an image, but also segmenting, classifying all parts within the foreground objects and linking them to their corresponding parent objects. Existing methods usually handle the PPS task by performing object - level and part - level segmentations separately. However, these methods have the following main problems: 1. **Inconsistent learning objectives**: There is no clear link between the part - level predictions of existing methods and a single parent object, resulting in inconsistent learning objectives with the goals of the PPS task, thus affecting PPS performance. 2. **Conflicting feature representations**: Since parts and objects are predicted separately, the network may have conflicts in feature representations, reducing the ability to separate instances. 3. **Incompatible predictions**: Separately predicting objects and parts may lead to incompatible prediction results, requiring additional post - processing steps to correct. To solve these problems, the authors propose the Task - Aligned Part - aware Panoptic Segmentation (TAPPS) method. TAPPS uses a set of shared queries to jointly predict object - level and part - level segmentations, ensuring that each part - level segmentation is associated with a specific object instance. In this way, the learning objective of TAPPS is consistent with the goal of the PPS task, thereby improving the accuracy of PPS predictions. Specifically, the contributions of TAPPS include: - Proposing a simple PPS method that makes the learning objective consistent with the task objective, promotes object instance separation, and enables a joint object - part representation to improve prediction accuracy. - Using shared object - part queries, constraining TAPPS to only predict part segments that are compatible with the predicted object segments, forcing object - part compatibility and simplifying the part segmentation task. - Experimentally verifying the effectiveness of TAPPS on multiple datasets and network configurations, demonstrating its performance significantly superior to existing methods. In summary, this paper is committed to improving the performance of the PPS task by improving the consistency of learning objectives and feature representations.