AI-based text-to-image synthesis: A review

Zili Wang
DOI: https://doi.org/10.54254/2755-2721/45/20241038
2024-03-15
Abstract:The traditional methods of art generation, such as texture synthesis and texture mapping, have been instrumental in crafting digital art for decades. They are used as artistic tools to design and map textures onto 3D models, thereby generating 2D images or animations. However, they can only generate simple, repetitive images. Thanks to the rapid development of deep learning and artificial intelligence, todays text-to-image synthesis (T2IS) models can generate high-quality, realistic images matching the textual description given by the users. This review paper aims to present a comprehensive exploration of groundbreaking AI-based T2IS models in history. We start with an in-depth analysis of the fundamental concepts that underpin T2IS models, followed by an introduction to the primary, or vanilla, models that have served as the foundation for the fields development. Then, we delve into the examination of several groundbreaking AI-based T2IS applications, from GAN-based to Diffusion-based models, demonstrating their ability to produce high-quality, contextually accurate images from textual descriptions, along with their strengths and weaknesses. In the end, we will discuss the current challenges and potential future directions in the realm of T2IS.
What problem does this paper attempt to address?