Abstract:Scene graph parsing has become a new challenge in the field of image understanding and pattern recognition in recent years. It captures objects and their relationships, and provides a structured representation of the visual scene. Among the three types of high-level relationships of scene graphs, semantic relationships, which contain the global understanding of the scene, are the core and the most valuable, while geometric and possessive relationships contain local and limited information. However, semantic relationships have the characteristics of multiple types and fewer instances, leading to a low recognition rate of most semantic relationships by existing detectors. To address this issue, this paper proposes a new architecture, the graphical focal network, which uses a decision-level global detector to capture the dependencies between object and relationship local detectors. We construct a graphical focal loss, which overcomes the lack of semantic relationship instances by adjusting the proportion of relationship loss based on the degree of relationship rarity and learning difficulty, and improves the stability of key object recognition by adjusting the proportion of object loss based on the degree of node connectivity and the value of neighborhood relationships. The proposed relative depth encoding module and regional layout encoding module, respectively, introduce relative depth information and more effective geometric layout information between objects, thereby further improving the performance. Experiments using the Visual Genome benchmark show that our method outperforms the most advanced competitors in two types of performance metrics. For semantic types, the recognition rate of our method is 2.0 times that of the baseline. (C) 2020 Elsevier Ltd. All rights reserved.

Subgraph and Object Context‐masked Network for Scene Graph Generation

Scene Graph Generation With Hierarchical Context

Attention Redirection Transformer with Semantic Oriented Learning for Unbiased Scene Graph Generation

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation

Fast Contextual Scene Graph Generation with Unbiased Context Augmentation.

HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

Divide and Conquer: Subset Matching for Scene Graph Generation in Complex Scenes

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation

Scene Graph Generation using Depth-based Multimodal Network.

Scene Graph Generation Based On Node-Relation Context Module

PANet: A Context Based Predicate Association Network for Scene Graph Generation

Attentive Gated Graph Neural Network for Image Scene Graph Generation

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Semantic-Context Graph Network for Point-based 3D Object Detection

Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network

Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation

Relation Regularized Scene Graph Generation

Learning to transfer focus of graph neural network for scene graph parsing

Scene Graph Generation with Geometric Context

Exploiting Scene Graphs for Human-Object Interaction Detection

OCNet: Object Context Network for Scene Parsing