Neural Architecture Search with a Lightweight Transformer for Text-to-Image Synthesis

Wei Li,Shiping Wen,Kaibo Shi,Yin Yang,Tingwen Huang
DOI: https://doi.org/10.1109/tnse.2022.3147787
IF: 6.6
2022-01-01
IEEE Transactions on Network Science and Engineering
Abstract:Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such as StackGAN, AttnGAN, etc. Since the quality for text-to-image synthesis is more and more demanding, these old and tandem architectures with simple convolution operations are no longer suitable. Therefore, a novel text-to-image synthesis network combining with the latest technologies is in urgent need of exploration. To tackle with this challenge, we creatively propose a unique architecture for text-to-image synthesis, dubbed T2IGAN, which is automatically searched by neural architecture search (NAS). In addition, considering the amazing capabilities of the popular transformer in natural language processing and computer vision, a lightweight transformer is applied in our search space to efficiently integrate the text features and image features. Ultimately, the effectiveness of our searched T2IGAN is remarkable by experimentally evaluating it on the typical text-to-image synthesis datasets. Specifically, we achieve an excellent result of IS 5.12 and FID 10.48 on CUB-200 Birds, IS 4.89 and FID 13.55 on Oxford-102 Flowers, IS 31.93 and FID 26.45 on COCO. By contrast with the state-of-the-art works, ours gets better performance on CUB-200 Birds and Oxford-102 Flowers.
engineering, multidisciplinary,mathematics, interdisciplinary applications
What problem does this paper attempt to address?