Exploring Logical Reasoning for Referring Expression Comprehension.

Ying Cheng,Ruize Wang,Jiashuo Yu,Rui-Wei Zhao,Yuejie Zhang,Rui Feng
DOI: https://doi.org/10.1145/3474085.3475677
2021-01-01
Abstract:Referring expression comprehension aims to localize the target object in an image referred by a natural language expression. Most existing approaches neglect the implicit logical correlations among fine-grained cues, e.g., categories, attributes, which are beneficial for distinguishing objects. In this paper, we propose a logic-guided approach to explore logical knowledge for referring expression comprehension in a hierarchical modular-based framework. Specifically, we propose to extract fine-grained cues in visual and textual domains and perform logical reasoning over them with explicit logical expressions to regularize the matching process without extra parameters. Besides, we propose to improve existing modular-based methods by introducing context information of objects in the relationship module. Extensive experiments are conducted on three referring expression datasets, and the results demonstrate that our model can produce more consistent predictions and further achieve superior performance compared with previous methods.
What problem does this paper attempt to address?