Perspective (In)consistency of Paint by Text

Hany Farid
DOI: https://doi.org/10.48550/arXiv.2206.14617
2022-06-28
Abstract:Type "a sea otter with a pearl earring by Johannes Vermeer" or "a photo of a teddy bear on a skateboard in Times Square" into OpenAI's DALL-E-2 paint-by-text synthesis engine and you will not be disappointed by the delightful and eerily pertinent results. The ability to synthesize highly realistic images -- with seemingly no limitation other than our imagination -- is sure to yield many exciting and creative applications. These images are also likely to pose new challenges to the photo-forensic community. Motivated by the fact that paint by text is not based on explicit geometric modeling, and the human visual system's often obliviousness to even glaring geometric inconsistencies, we provide an initial exploration of the perspective consistency of DALL-E-2 synthesized images to determine if geometric-based forensic analyses will prove fruitful in detecting this new breed of synthetic media.
Graphics,Artificial Intelligence,Computer Vision and Pattern Recognition,Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore whether images synthesized by text - to - image generation techniques (such as DALL·E - 2) can withstand geometry - based forensic analysis by analyzing their performance in perspective consistency. Specifically, the paper focuses on whether the 3D structures, shadow casting, and specular reflection in these synthetic images conform to the perspective geometry rules expected in natural scenes. The author points out that although these images are very visually appealing and realistic, there are inconsistencies in their geometric structures, shadows, and reflections. These inconsistencies, although may not be obvious, can be useful clues for image forensics analysis. The paper also discusses the reasons for these inconsistencies, including the limitations of the training data set and the influence of the model parameter scale, and looks forward to future development trends.