Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

Yansheng Li,Tingzhu Wang,Kang Wu,Linlin Wang,Xin Guo,Wenbin Wang
2024-07-27
Abstract:Scene Graph Generation (SGG) aims to explore the relationships between objects in images and obtain scene summary graphs, thereby better serving downstream tasks. However, the long-tailed problem has adversely affected the scene graph's quality. The predictions are dominated by coarse-grained relationships, lacking more informative fine-grained ones. The union region of one object pair (i.e., one sample) contains rich and dedicated contextual information, enabling the prediction of the sample-specific bias for refining the original relationship prediction. Therefore, we propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained SGG (SBG). Firstly, we train a classic SGG model and construct a correction bias set by calculating the margin between the ground truth label and the predicted label with one classic SGG model. Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases, which can be utilized to correct the original predictions from coarse-grained relationships to fine-grained ones. The extensive experimental results on VG, GQA, and VG-1800 datasets demonstrate that our SBG outperforms the state-of-the-art methods in terms of Average@K across three mainstream SGG models: Motif, VCtree, and Transformer. Compared to dataset-level correction methods on VG, SBG shows a significant average improvement of 5.6%, 3.9%, and 3.2% on Average@K for tasks PredCls, SGCls, and SGDet, respectively. The code will be available at <a class="link-external link-https" href="https://github.com/Zhuzi24/SBG" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the long - tailed distribution problem in **Scene Graph Generation (SGG)**. Specifically, the goal of SGG is to generate a structured semantic graph from an image to better serve downstream tasks. However, due to the long - tailed distribution problem of relationship categories in the dataset, the prediction results of the model are often biased towards coarse - grained relationships and lack fine - grained relationships. This long - tailed distribution problem leads to over - fitting of the model to a few common relationships and ignores rare but more informative relationships. To solve this problem, the author proposes a novel **Sample - Level Bias Prediction (SBP)** method. By predicting the bias specific to each sample, the original coarse - grained relationship prediction is corrected to obtain a more fine - grained relationship prediction. The specific steps are as follows: 1. **Train the classic SGG model**: First, use the classic SGG model for training, and construct a correction bias set based on the gap between the true label and the predicted label. 2. **Design a Bias - Oriented Generative Adversarial Network (BGAN)**: BGAN learns to predict the constructed correction bias through adversarial training for modifying the original prediction. 3. **Sample - level bias correction**: Use the sample - specific bias predicted by BGAN to refine the classic coarse - grained relationship prediction into a more accurate fine - grained relationship prediction. Through this method, the author hopes to significantly improve the prediction ability of rare relationships without sacrificing the prediction accuracy of common relationships, thereby improving the quality and performance of the overall scene graph. ### Main contributions of the paper - **Explore sample - level bias correction for the first time**: In response to the long - tailed distribution problem, a new method is proposed that can refine coarse - grained relationships into fine - grained relationships. - **Design a new Bias - Oriented Generative Adversarial Network (BGAN)**: Use context information to predict sample - specific correction biases. - **Experimental results show superiority**: On multiple mainstream SGG models (Motif, VCtree, Transformer), SBG performs better than the existing state - of - the - art methods on the VG, GQA, and VG - 1800 datasets, especially with significant improvement in the A@K metric. These contributions not only demonstrate the effectiveness of SBG but also prove its strong generalization ability and balanced performance when dealing with the long - tailed distribution problem.