From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

Hanrong Shi,Lin Li,Jun Xiao,Yueting Zhuang,Long Chen
2024-07-12
Abstract:Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CAFE) learning strategy for PSG. Specifically, we incorporate shape-aware features (i.e., mask features and boundary features) into PSG, moving beyond reliance solely on bbox features. Furthermore, drawing inspiration from human cognition, we propose to integrate shape-aware features in an easy-to-hard manner. To achieve this, we categorize the predicates into three groups based on cognition learning difficulty and correspondingly divide the training process into three stages. Each stage utilizes a specialized relation classifier to distinguish specific groups of predicates. As the learning difficulty of predicates increases, these classifiers are equipped with features of ascending complexity. We also incorporate knowledge distillation to retain knowledge acquired in earlier stages. Due to its model-agnostic nature, CAFE can be seamlessly incorporated into any PSG model. Extensive experiments and ablations on two PSG tasks under both robust and zero-shot PSG have attested to the superiority and robustness of our proposed CAFE, which outperforms existing state-of-the-art methods by a large margin.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the Panoptic Scene Graph Generation (PSG) task, almost all existing methods overlook the importance of shape - aware features, and these features are crucial for capturing the contour and boundary information of objects. Therefore, the author proposes a new model - agnostic curriculum learning strategy - Curricular shApe - aware FEature (CAFE) to make up for this deficiency. ### Specific background of the problem 1. **Limitations of traditional Scene Graph Generation (SGG)**: - SGG relies on the bounding - box - based paradigm, which may lead to inaccurate object localization and limited background annotation. - The emerging PSG solves these problems by using a more fine - grained panoptic segmentation representation (i.e., scene masks) and defines the relationships between backgrounds. 2. **Deficiencies of existing PSG methods**: - Most existing PSG methods inherit the strategies of SGG and still mainly rely on spatial features extracted from the minimum bounding box (bbox). - This method ignores shape - aware features (such as mask features and boundary features), resulting in possible semantic confusion in fine - grained visual relationship prediction. ### Solutions proposed in the paper To overcome the above problems, the author proposes the following solutions: 1. **Introducing shape - aware features**: - Shape - aware features include two types: mask features and boundary features. - Mask features utilize the details in the fine - grained mask representation, including the shape and contour of the object; boundary features are extracted from the intersection of the subject and object masks and are helpful for capturing the interaction between subject - object pairs. 2. **Curriculum Learning strategy**: - Inspired by the human cognitive process, the author proposes a phased learning strategy, dividing predicates into three difficulty groups and correspondingly dividing the training process into three phases. - Each phase uses a specialized relationship classifier to handle predicates in a specific group and gradually increases the complexity of features, from simple bbox features to complex boundary features. - The knowledge distillation technique is adopted between different phases to retain the knowledge obtained in the early phases. 3. **Model - agnosticism**: - CAFE is a model - agnostic strategy that can be seamlessly integrated into any existing PSG model, thereby improving its performance. ### Experimental verification Through extensive experiments on challenging PSG datasets, the author proves the effectiveness and robustness of CAFE. Specifically: - In the robust PSG task, CAFE achieves new state - of - the - art performance among different metrics. - In the zero - shot PSG task, CAFE can infer unseen visual relationship triplets by utilizing the robust visual relationship features learned during training. ### Summary The main contributions of this paper include: 1. In - depth exploration of the key problems existing in the PSG task: over - relying on bounding - box - based spatial features while ignoring shape - aware features. 2. Proposing a new model - agnostic curriculum learning strategy (CAFE), enabling the model to learn shape - aware features in a simple - to - complex manner. 3. Demonstrating the robustness and effectiveness of CAFE through extensive experimental results, significantly outperforming the existing state - of - the - art methods.