Towards Confidence-Aware Commonsense Knowledge Integration for Scene Graph Generation

Hongshuo Tian,Ning Xu,Yanhui Wang,Chenggang Yan,Bolun Zheng,Xuanya Li,An-An Liu
DOI: https://doi.org/10.1109/ICME55011.2023.00385
2023-01-01
Abstract:Commonsense knowledge has been widely explored to improve Scene Graph Generation (SGG). Existing methods simply incorporate the described relations of knowledge bases into each part of the scene for a concrete understanding. However, they ignore the discussion about whether a visual scene needs to associate commonsense knowledge for making inferences. Specifically, the difficulty of relation recognition varies from its type. Some frequent spatial relations (e.g. on) usually produce less perception error even without any prior information, while others involved many rules and patterns (e.g. throwing) possess few samples and require to combine with some commonsense knowledge as supplementary. In this paper, we propose a novel confidence-aware commonsense knowledge integration for SGG. Firstly, we depend on mutual information maximization to design a hybrid-attention module, which decreases the uncertainty in representation learning given external knowledge. Second, we introduce an extra branch for SGG network to perform confidence estimation independent of any ground truth labels, in which the output scalar explicitly reflects the difficulty of visual recognition. This value is equipped with the ability to balance the demand for commonsense knowledge in a given scene. Experiments are conducted with the backbone of MOTIFS on Visual Genome (VG) and our method effectively promotes the metric of mRecall with little performance hit for metric Recall, especially for predicting unseen relations.
What problem does this paper attempt to address?