Probing Commonsense Reasoning Capability of Text-to-Image Generative Models Via Non-visual Description

Mianzhi Pan,Jianfei Li,Mingyue Yu,Zheng Ma,Kanzhi Cheng,Jianbing Zhang,Jiajun Chen
DOI: https://doi.org/10.48550/arxiv.2312.07294
2023-01-01
Abstract:Commonsense reasoning, the ability to make logical assumptions about dailyscenes, is one core intelligence of human beings. In this work, we present anovel task and dataset for evaluating the ability of text-to-image generativemodels to conduct commonsense reasoning, which we call PAINTaboo. Given adescription with few visual clues of one object, the goal is to generate imagesillustrating the object correctly. The dataset was carefully hand-curated andcovered diverse object categories to analyze model performance comprehensively.Our investigation of several prevalent text-to-image generative models revealsthat these models are not proficient in commonsense reasoning, as anticipated.We trust that PAINTaboo can improve our understanding of the reasoningabilities of text-to-image generative models.
What problem does this paper attempt to address?