Abstract:Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not linked to individual parent objects. Therefore, their learning objective is not aligned with the PPS task objective, which harms the PPS performance. To solve this, and make more accurate PPS predictions, we propose Task-Aligned Part-aware Panoptic Segmentation (TAPPS). This method uses a set of shared queries to jointly predict (a) object-level segments, and (b) the part-level segments within those same objects. As a result, TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations. With experiments, we show that TAPPS considerably outperforms methods that predict objects and parts separately, and achieves new state-of-the-art PPS results.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the key challenges in the Part - aware Panoptic Segmentation (PPS) task. The PPS task requires not only segmenting and classifying each foreground object and background area in an image, but also segmenting, classifying all parts within the foreground objects and linking them to their corresponding parent objects. Existing methods usually handle the PPS task by performing object - level and part - level segmentations separately. However, these methods have the following main problems: 1. **Inconsistent learning objectives**: There is no clear link between the part - level predictions of existing methods and a single parent object, resulting in inconsistent learning objectives with the goals of the PPS task, thus affecting PPS performance. 2. **Conflicting feature representations**: Since parts and objects are predicted separately, the network may have conflicts in feature representations, reducing the ability to separate instances. 3. **Incompatible predictions**: Separately predicting objects and parts may lead to incompatible prediction results, requiring additional post - processing steps to correct. To solve these problems, the authors propose the Task - Aligned Part - aware Panoptic Segmentation (TAPPS) method. TAPPS uses a set of shared queries to jointly predict object - level and part - level segmentations, ensuring that each part - level segmentation is associated with a specific object instance. In this way, the learning objective of TAPPS is consistent with the goal of the PPS task, thereby improving the accuracy of PPS predictions. Specifically, the contributions of TAPPS include: - Proposing a simple PPS method that makes the learning objective consistent with the task objective, promotes object instance separation, and enables a joint object - part representation to improve prediction accuracy. - Using shared object - part queries, constraining TAPPS to only predict part segments that are compatible with the predicted object segments, forcing object - part compatibility and simplifying the part segmentation task. - Experimentally verifying the effectiveness of TAPPS on multiple datasets and network configurations, demonstrating its performance significantly superior to existing methods. In summary, this paper is committed to improving the performance of the PPS task by improving the consistency of learning objectives and feature representations.

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Panoptic-PartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

PASS: Panoramic Annular Semantic Segmentation

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

Can We PASS Beyond the Field of View? Panoramic Annular Semantic Segmentation for Real-World Surrounding Perception

Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation

Merging Tasks for Video Panoptic Segmentation

EfficientPPS: Part-aware Panoptic Segmentation of Transparent Objects for Robotic Manipulation

EfficientLPS: Efficient LiDAR Panoptic Segmentation

PATS: Patch Area Transportation with Subdivision for Local Feature Matching.

An End-to-End Network for Panoptic Segmentation

PPSAN: Perceptual-aware 3D Point Cloud Segmentation Via Adversarial Learning.

EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

PUPS: Point Cloud Unified Panoptic Segmentation

SpatialFlow: Bridging All Tasks for Panoptic Segmentation

Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning

Attention-Guided Unified Network for Panoptic Segmentation

Towards Document Panoptic Segmentation with Pinpoint Accuracy: Method and Evaluation

Towards Imbalanced Motion: Part-Decoupling Network for Video Portrait Segmentation

Lidar Panoptic Segmentation in an Open World