Learning to Infer Unseen Single-/ Multi-Attribute-Object Compositions with Graph Networks.
Hui Chen,Jingjing Jiang,Nanning Zheng
DOI: https://doi.org/10.1109/tpami.2023.3273712
IF: 23.6
2023-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Inferring the unseen attribute-object composition is critical to make machines learn to decompose and compose complex concepts like people. Most existing methods are limited to the composition recognition of single-attribute-object, and can hardly learn relations between the attributes and objects. In this paper, we propose an attribute-object semantic association graph model to learn the complex relations and enable knowledge transfer between primitives. With nodes representing attributes and objects, the graph can be constructed flexibly, which realizes both single- and multi-attribute-object composition recognition. In order to reduce mis-classifications of similar compositions (e.g., scratched screen and broken screen), driven by the contrastive loss, the anchor image feature is pulled closer to the corresponding label feature and pushed away from other negative label features. Specifically, a novel balance loss is proposed to alleviate the domain bias, where a model prefers to predict seen compositions. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 label categories for inferring unseen multi-attribute-object compositions. Along with MAD, we propose two novel metrics Hard and Soft to give a comprehensive evaluation in the multi-attribute setting. Experiments on MAD and two other single-attribute-object benchmarks (MIT-States and UT-Zappos50K) demonstrate the effectiveness of our approach.