Abstract:The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the long - tail distribution problem in the Scene Graph Generation (SGG) task. Specifically, current SGG research mainly focuses on generating unbiased scene graphs, but most de - biasing methods over - emphasize tail predicates during the training process while underestimating head predicates. This undermines the representational ability of head predicate features and further affects the learning of tail predicates. Moreover, the reasoning of tail predicates highly depends on the general patterns learned from head predicates, for example, "standing on" depends on "on". Therefore, these de - biasing methods can neither achieve excellent performance on tail predicates nor reach a satisfactory effect on head predicates. To solve this problem, the authors propose a Dual - branch Hybrid Learning network (DHL) to handle head and tail predicates simultaneously, thereby improving the overall performance of the SGG model. ### Main contributions 1. **Dual - branch Hybrid Learning network (DHL)**: It consists of a Coarse - grained Learning Branch (CLB) and a Fine - grained Learning Branch (FLB). The CLB is responsible for learning the expertise and robust features of head predicates, while the FLB is used to predict more informative tail predicates. 2. **Curriculum Re - weighting Mechanism (CRM)**: It optimizes the FLB, making it first learn head predicates and then gradually focus on tail predicates. In addition, a Semantic Context Module (SCM) is designed to correct inconsistent predictions in the FLB, making the model more stable. 3. **Extensive experimental verification**: The experimental results show that DHL significantly improves the performance of the baseline model on the VG and GQA datasets and demonstrates its generalization ability and practicality in two downstream tasks, namely image caption generation and sentence - to - graph retrieval. Through these improvements, DHL effectively balances the performance of head and tail predicates and overcomes the limitations of existing de - biasing methods.

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation

Head-Tail Cooperative Learning Network for Unbiased Scene Graph Generation

Adaptive Feature Learning for Unbiased Scene Graph Generation

DBiased-P: Dual-Biased Predicate Predictor for Unbiased Scene Graph Generation

Informative Scene Graph Generation via Debiasing

Fast Contextual Scene Graph Generation with Unbiased Context Augmentation.

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation

Heterogeneous Learning for Scene Graph Generation

Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

Hyper-relationship Learning Network for Scene Graph Generation

Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs

Peer Learning for Unbiased Scene Graph Generation

Ensemble Predicate Decoding for Unbiased Scene Graph Generation

Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

Addressing Predicate Overlap in Scene Graph Generation with Semantic Granularity Controller

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning