Abstract:Scene graph generation refers to the task of identifying the objects and specifically the relationships between the objects from an image. Existing scene graph generation methods generally use the bounding boxes region features of objects to identify the relationships between objects. However, we feel that the overlap region features of two objects may play an important role in fine‐grained relationship identification. In fact, some fine‐grained relationships can only be obtained from the overlap region features of two objects. Therefore, we propose the Multi‐Branch Feature Combination (MFC) module and Overlap Region Transformer (ORT) module to comprehensively obtain the visual features contained in the overlap regions of two objects. Concretely, the MFC module uses deconvolution and multi‐branch dilation convolution to obtain high‐pixels and multi‐receptive field features in the overlap regions. The ORT module uses the vision transformer to obtain the self‐attention of the overlap regions. The joint use of these two modules achieves the mutual complementation of local connectivity properties of convolution and the global connectivity properties of attention. We also design a Geometrical Center Augmented (GCA) module to obtain the relative position information of the geometric centers between two objects, to prevent the problem that only relying on the scale of the overlap region cannot accurately capture the relationship between two objects. Experiments show that our model ORGC (Overlap Region and Geometrical Center), the combination of the MFC module, the ORT module, and the GCA module, can enhance the performance of fine‐grained relation identification. On the Visual Genome dataset, our model outperforms the current state‐of‐the‐art model by 4.4% on the R@50 evaluation metric, reaching a state‐of‐the‐art result of 33.88.

Scene Graph Generation Via Multi-Relation Classification and Cross-Modal Attention Coordinator.

PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation

Attention Redirection Transformer with Semantic Oriented Learning for Unbiased Scene Graph Generation

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation.

Multi-Scale Graph Attention Network for Scene Graph Generation

Fast Contextual Scene Graph Generation with Unbiased Context Augmentation.

Scene Graph Generation using Depth-based Multimodal Network.

Reasoning in Different Directions: Triplet Learning for Scene Graph Generation

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation

Toward Region-Aware Attention Learning for Scene Graph Generation

Fine‐Grained Scene Graph Generation with Overlap Region and Geometrical Center

Memory-Based Network for Scene Graph with Unbalanced Relations

A Hierarchical Recurrent Approach To Predict Scene Graphs From A Visual-Attention-Oriented Perspective

Relationship-Aware Primal-Dual Graph Attention Network For Scene Graph Generation.

Relation-Specific Feature Augmentation for unbiased scene graph generation

Scene Graph Generation Via Convolutional Message Passing and Class-Aware Memory Embeddings

Scene Graph Generation Based On Node-Relation Context Module

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Cross-Attention-Driven Adaptive Graph Relational Network for Multilabel Remote Sensing Scene Classification

Self-Supervised Relation Alignment for Scene Graph Generation