Learning to Embed Seen/Unseen Compositions based on Graph Networks

Dongyao Jiang,Hui Chen,Yongqiang Ma,Haodong Jing,Nanning Zheng
DOI: https://doi.org/10.1109/CAC59555.2023.10450221
2023-01-01
Abstract:Composability allows known concepts to form newer and more complex ones. This coupling process is the research interests of Compositional Zero-Shot Learning (CZSL). The goal can be described as building a classifier for unknown compositions in the testing set based on known attribute primitives (e.g., old, cute) and object primitives (e.g., cats, cars) in the training set. There are many challenges in this process. For example, the same attribute primitive behaves significantly distinct on different objects. Common CZSL methods introduce auxiliary classification information into the model by using the pretrained model or external knowledge base, but the distribution of introduced auxiliary information is usually inconsistent with the distribution of the class information contained in training set itself, resulting in the model's misunderstanding of combined features. In view of this deficiency, we proposed a novel Compositional Graph Convolutional Network model, which consists of two embedding networks for image and label text modal data respectively. With graph convolutional networks, we can eliminate potential differences in information distribution in the pre-training data. Besides, we add a quintuplet loss to cross-entropy loss to generate a smoother feature representation. The results on three benchmarks including MIT-States, UT-Zappos and C-GQA datasets show that the proposed model exceeds the seven state-of-the-art methods in terms of Area Under the Curve (AUC) and classification accuracy. This confirms that our method can generate more recognizable compositional features.
What problem does this paper attempt to address?