Text to Photo-Realistic Image Synthesis Via Chained Deep Recurrent Generative Adversarial Network.

Min Wang,Congyan Lang,Songhe Feng,Tao Wang,Yi Jin,Yidong Li
DOI: https://doi.org/10.1016/j.jvcir.2020.102955
IF: 2.887
2021-01-01
Journal of Visual Communication and Image Representation
Abstract:Despite the promising progress made in recent years, automatically generating high-resolution realistic images from text descriptions remains a challenging task due to semantic gap between human-written descriptions and diversities of visual appearance. Most existing approaches generate the rough images with the given text descriptions, while the relationship between sentence semantics and visual content is not holistically exploited. In this paper, we propose a novel chained deep recurrent generative adversarial network (CDRGAN) for synthesizing images from text descriptions. Our model uses carefully designed chained deep recurrent generators that simultaneously recovers global image structures and local details. Specially, our method not only considers the logic relationships of image pixels, but also removes computational bottlenecks through parameters sharing. We evaluate our method on three public benchmarks: CUB, Oxford-102 and MS COCO datasets. Experimental results show that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.
What problem does this paper attempt to address?