Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation

Zhuowei Chen,Zhendong Mao,Shancheng Fang,Bo Hu
DOI: https://doi.org/10.1145/3503161.3548154
2022-01-01
Abstract:Text-to-Image generation (T2I) aims to generate realistic and semantically consistent images according to the natural language descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great process. However, a close inspection of their generated images shows two major limitations: 1) the background (e.g., fence, lake) of the generated image with the complicated, real-world scene tends to be unrealistic; 2) the object (e.g., elephant, zebra) in the generated image often presents highly distorted shape or key parts missing. To address these limitations, we propose a two-stage T2I approach, where the first stage redesigns the text-to-layout process to incorporate the background layout with the existing object layout, the second stage transfers the object knowledge from an existing class-to-image model to the layout-to-image process to improve the object fidelity. Specifically, a transformer-based architecture is introduced as the layout generator to learn the mapping from text to layout of object and background, and a Text-attended Layout-aware feature Normalization (TL-Norm) is proposed to adaptively transfer the object knowledge to the image generation. Benefitting from the background layout and transferred object knowledge, the proposed approach significantly surpasses previous state-of-the-art methods in the image quality metric and achieves superior image-text alignment performance.
What problem does this paper attempt to address?