Ensemble Predicate Decoding for Unbiased Scene Graph Generation

Jiasong Feng,Lichun Wang,Hongbo Xu,Kai Xu,Baocai Yin
2024-08-26
Abstract:Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that accurately captures the semantic information of a given scenario. However, the SGG model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias. According to existing works, the long-tail distribution of predicates in training data results in the biased scene graph. However, the semantic overlap between predicate categories makes predicate prediction difficult, and there is a significant difference in the sample size of semantically similar predicates, making the predicate prediction more difficult. Therefore, higher requirements are placed on the discriminative ability of the model. In order to address this problem, this paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation. Two auxiliary decoders trained on lower-frequency predicates are used to improve the discriminative ability of the model. Extensive experiments are conducted on the VG, and the experiment results show that EPD enhances the model's representation capability for predicates. In addition, we find that our approach ensures a relatively superior predictive capability for more frequent predicates compared to previous unbiased SGG methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the bias problem in Scene Graph Generation (SGG), specifically the bias in predicate prediction. In existing SGG models, due to the long - tailed distribution and semantic overlap of predicates in the training data, the model tends to predict coarser - grained predicates and has difficulty accurately predicting finer - grained predicates. This bias makes the generated scene graph information incomplete and affects its application effectiveness in downstream tasks. To solve this problem, the paper proposes a new method - Ensemble Predicate Decoding (EPD). EPD improves the model's ability to distinguish different - frequency predicates by introducing multiple decoders, thereby achieving unbiased scene graph generation. Specifically, EPD includes a main decoder and two auxiliary decoders, which are trained for different predicate subsets respectively to reduce the impact of the long - tailed effect and improve the model's generalization ability. The following are the main contributions of the paper: 1. **Proposing a new model - independent SGG method EPD**: This method includes a main decoder and two auxiliary decoders. The main decoder is responsible for decoding all predicates, while the auxiliary decoders focus on enhancing the model's decoding ability for low - frequency predicates. 2. **Conducting a large number of experiments on the widely - used SGG benchmark dataset Visual Genome (VG)**: The results show that EPD performs excellently when combined with various scene graph baseline models, especially with a significant improvement in the mR@K metric and a very small decrease in the R@K metric. Through these improvements, EPD can significantly improve the prediction performance of low - frequency predicates while maintaining the prediction accuracy of high - frequency predicates, thereby generating more accurate and complete scene graphs.