Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

Chaofan Zheng,Lianli Gao,Xinyu Lyu,Pengpeng Zeng,Abdulmotaleb El Saddik,Heng Tao Shen
DOI: https://doi.org/10.48550/arXiv.2207.07913
2022-07-16
Abstract:The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the long - tail distribution problem in the Scene Graph Generation (SGG) task. Specifically, current SGG research mainly focuses on generating unbiased scene graphs, but most de - biasing methods over - emphasize tail predicates during the training process while underestimating head predicates. This undermines the representational ability of head predicate features and further affects the learning of tail predicates. Moreover, the reasoning of tail predicates highly depends on the general patterns learned from head predicates, for example, "standing on" depends on "on". Therefore, these de - biasing methods can neither achieve excellent performance on tail predicates nor reach a satisfactory effect on head predicates. To solve this problem, the authors propose a Dual - branch Hybrid Learning network (DHL) to handle head and tail predicates simultaneously, thereby improving the overall performance of the SGG model. ### Main contributions 1. **Dual - branch Hybrid Learning network (DHL)**: It consists of a Coarse - grained Learning Branch (CLB) and a Fine - grained Learning Branch (FLB). The CLB is responsible for learning the expertise and robust features of head predicates, while the FLB is used to predict more informative tail predicates. 2. **Curriculum Re - weighting Mechanism (CRM)**: It optimizes the FLB, making it first learn head predicates and then gradually focus on tail predicates. In addition, a Semantic Context Module (SCM) is designed to correct inconsistent predictions in the FLB, making the model more stable. 3. **Extensive experimental verification**: The experimental results show that DHL significantly improves the performance of the baseline model on the VG and GQA datasets and demonstrates its generalization ability and practicality in two downstream tasks, namely image caption generation and sentence - to - graph retrieval. Through these improvements, DHL effectively balances the performance of head and tail predicates and overcomes the limitations of existing de - biasing methods.