Abstract:Automatic scene generation is an essential area of research with applications in robotics, recreation, visual representation, training and simulation, education, and more. This survey provides a comprehensive review of the current state-of-the-arts in automatic scene generation, focusing on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing (NLP). We categorize the models into four main types: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion Models. Each category is explored in detail, discussing various sub-models and their contributions to the field. We also review the most commonly used datasets, such as COCO-Stuff, Visual Genome, and MS-COCO, which are critical for training and evaluating these models. Methodologies for scene generation are examined, including image-to-3D conversion, text-to-3D generation, UI/layout design, graph-based methods, and interactive scene generation. Evaluation metrics such as Frechet Inception Distance (FID), Kullback-Leibler (KL) Divergence, Inception Score (IS), Intersection over Union (IoU), and Mean Average Precision (mAP) are discussed in the context of their use in assessing model performance. The survey identifies key challenges and limitations in the field, such as maintaining realism, handling complex scenes with multiple objects, and ensuring consistency in object relationships and spatial arrangements. By summarizing recent advances and pinpointing areas for improvement, this survey aims to provide a valuable resource for researchers and practitioners working on automatic scene generation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve are the key technologies and challenges in the field of Automatic Scene Generation. Specifically, the goals of the paper include: 1. **Comprehensively review existing technologies**: The paper aims to comprehensively review and summarize the current state - of - the - art automatic scene generation technologies, especially those using machine learning, deep learning, embedded systems and natural language processing (NLP). 2. **Classify models and methods**: The paper classifies the models into four main types: - **Variational Autoencoders (VAEs)** - **Generative Adversarial Networks (GANs)** - **Transformers** - **Diffusion Models** 3. **Evaluate commonly - used datasets**: The paper discusses in detail commonly - used training and evaluation datasets, such as COCO - Stuff, Visual Genome and MS - COCO, etc., and explains the importance of these datasets in automatic scene generation. 4. **Explore methodologies**: The paper explores a variety of scene - generation methods, including image - to - 3D conversion, text - to - 3D generation, UI / layout design, graph - based methods and interactive scene generation, etc. 5. **Analyze evaluation metrics**: The paper discusses a variety of evaluation metrics used to evaluate model performance, such as Fréchet Inception Distance (FID), Kullback - Leibler (KL) Divergence, Inception Score (IS), Intersection over Union (IoU) and Mean Average Precision (mAP), etc. 6. **Identify challenges and limitations**: The paper points out the key challenges and limitations faced in this field, for example, maintaining the realism of the scene, handling complex scenes containing multiple objects and ensuring the consistency of the relationships and spatial arrangements between objects. By summarizing the research progress in recent years and pointing out the direction for improvement, this paper aims to provide a valuable resource for researchers and practitioners to promote the development of the automatic scene generation field. ### Formula Examples Some of the formulas mentioned in the paper are as follows: - The loss function of Variational Autoencoder (VAE): \[ L_{\text{VAE}}=-\mathbb{E}_{q(z|x)}[\log p(x|z)]+\text{KL}[q(z|x)\|p(z)] \] where \(x\) represents an image, \(z\) represents a latent variable, \(\text{KL}[q(z|x)\|p(z)]\) is the Kullback - Leibler divergence term, and \(-\mathbb{E}_{q(z|x)}[\log p(x|z)]\) is the reconstruction loss. - The optimization objective of Generative Adversarial Network (GAN): \[ \min_G\max_D V(D, G)=\mathbb{E}_{x\sim p_{\text{data}}(x)}[\log D(x)]+\mathbb{E}_{z\sim p_z(z)}[\log(1 - D(G(z)))] \] where \(D\) represents the discriminator, \(G\) represents the generator, \(p_{\text{data}}(x)\) is the distribution of real data, and \(p_z(z)\) is the distribution of the noise vector \(z\). Through the introduction of these formulas and technologies, the paper provides readers with a basis for in - depth understanding of the field of automatic scene generation.

Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects

A Comprehensive Survey of Scene Graphs: Generation and Application

Scene Graph Generation: A Comprehensive Survey

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Advances in 3D Generation: A Survey

Automatic Story Generation: Challenges and Attempts

Visual Relationship Detection using Scene Graphs: A Survey

A survey of generative models used in text-to-image

A Survey of Generative Artificial Intelligence Techniques

A Survey On Text-to-3D Contents Generation In The Wild

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Deep learning-based scene understanding for autonomous robots: a survey

Unconditional Scene Graph Generation

A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

Comprehensive Exploration of Synthetic Data Generation: A Survey

A Systematic survey on automated text generation tools and techniques: application, evaluation, and challenges

What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable Generation

UniScene: Unified Occupancy-centric Driving Scene Generation

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

Recent Advances in Scene Image Representation and Classification

MegaScenes: Scene-Level View Synthesis at Scale