InDecGAN: Learning to Generate Complex Images from Captions Via Independent Object-Level Decomposition and Enhancement

Jun Cheng,Fuxiang Wu,Liu,Qieshi Zhang,Leszek Rutkowski,Dacheng Tao
DOI: https://doi.org/10.1109/tmm.2023.3256798
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Text-to-image synthesis is a challenging problem, in which a complex scene contains diverse objects of various sizes and sub-images of objects belonging to the same class have diverse forms from different perspectives. Thus, synthesis models have difficulty in capturing varied objects in the complex scene. To alleviate these problems, we devise an independent object-level decomposing and enhancing generative adversarial networks, denoted as InDecGAN, to synthesize complex images and capture varied objects in a complex scene. Specifically, InDecGAN fully utilizes the independent object-level information, bounding boxes and high-resolution images of objects in training, by employing independent object-level pathways to synthesize varied objects. The independent object-level pathway integrates an independent object-level adversarial loss and the bounding box information to learn the visual features of objects independently, then, the main pathway exploits the features provided by the object-level pathway to compose the full scene and synthesize images. In addition, we analyze the generalization properties of the proposed InDecGAN and demonstrate the improvement from the perspective of the model architecture. Moreover, extensive experiments conducted on a widely used dataset are presented to demonstrate that the proposed model with an independent object-level pathway produces synthesized images of significantly improved quality.
What problem does this paper attempt to address?