Text Pared into Scene Graph for Diverse Image Generation.

Yonghua Zhu,Jieyu Huang,Ning Ge,Yunwen Zhu,Binghui Zheng,Wenjun Zhang
DOI: https://doi.org/10.1145/3487075.3487158
2021-01-01
Abstract:Although significant recent advances in condition generative model have shown remarkable improvements for controlled image generation, the image generation for multiple complex objects is still a challenge. To address the challenge, we propose a module of text description parsed into scene graph, which can generate reasonable scene layout to ensure the generated image and object realistic. Our proposed method enhances the interaction between objects and global semantics by concatenates each object embedding with text embedding To preserve the local image semantics, the Spatially-adaptive normalization(SPADE) layer is added into the generator of our model. We validate our method on Visual Genome and COCO-Stuff, where qualitative results and ablation study demonstrate the ability of our model in generating images with multiple objects and complex relationships.
What problem does this paper attempt to address?