A Survey of AI Text-to-Image and AI Text-to-Video Generators

Aditi Singh

DOI: https://doi.org/10.1109/AIRC57904.2023.10303174

2023-11-11

Abstract:Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions. This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations. The survey provides an overview of the existing literature as well as an analysis of the approaches used in various studies. It covers data preprocessing techniques, neural network types, and evaluation metrics used in the field. In addition, the paper discusses the challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directions. Overall, these models have promising potential for a wide range of applications such as video production, content creation, and digital marketing.

Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Machine Learning,Image and Video Processing

What problem does this paper attempt to address?

This paper aims to solve the key problems and technical challenges in text - to - image (T2I) and text - to - video (T2V) generation. Specifically, the paper focuses on the following aspects: 1. **Technical Review**: The paper provides a technical review of current T2I and T2V generation models, including data pre - processing techniques, types of neural networks, and evaluation metrics. These techniques are the basis for achieving high - quality image and video generation. 2. **Model Performance**: The paper analyzes in detail the performance of different models, such as T2I generators like CogView2, DALL - E 2, Imagen, etc., and T2V generators like Make - A - Video, Imagen Video, Phenaki, GODIVA, and CogVideo. These models have different performances in terms of image quality and video coherence. 3. **Challenges and Limitations**: - **Data Set**: Obtaining and annotating high - quality training data is a major challenge. - **Interpretability**: The interpretability of the generated content is poor, and it is difficult to understand the logic behind the generated visual content. - **Computational Resources**: Generating high - resolution images and videos requires a large amount of computational resources, which limits their practical applications. - **Social Norms**: The generated content may not conform to social or public norms, leading to misunderstandings or inappropriate representations. 4. **Future Research Directions**: The paper discusses future research directions, including improving generation efficiency, enhancing the generalization ability of models, reducing computational costs, etc., to make these techniques more practical and widely applicable. Overall, the paper attempts to provide researchers in the T2I and T2V generation fields with a clear current situation and development direction through a comprehensive technical review and in - depth analysis.

A Survey of AI Text-to-Image and AI Text-to-Video Generators

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

A Survey of Generative Artificial Intelligence Techniques

Text-to-Image Synthesis: A Decade Survey

Text-to-Image Generation using Generative AI

AI-based text-to-image synthesis: A review

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

A survey of generative models used in text-to-image

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

A Survey of Natural Language Generation

A Systematic survey on automated text generation tools and techniques: application, evaluation, and challenges

The survey: Text generation models in deep learning

Generative AI in Vision: A Survey on Models, Metrics and Applications

Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview

A survey of generative adversarial networks and their application in text-to-image synthesis

Innovations in Neural Data-to-text Generation: A Survey

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction

A Survey On Text-to-3D Contents Generation In The Wild

Generative Artificial Intelligence: A Systematic Review and Applications