UrbanWorld: An Urban World Model for 3D City Generation

Yu Shang,Yuming Lin,Yu Zheng,Hangyu Fan,Jingtao Ding,Jie Feng,Jiansheng Chen,Li Tian,Yong Li
2024-10-22
Abstract:Cities, as the essential environment of human life, encompass diverse physical elements such as buildings, roads and vegetation, which continuously interact with dynamic entities like people and vehicles. Crafting realistic, interactive 3D urban environments is essential for nurturing AGI systems and constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and representation of complex urban elements. Therefore, accomplishing this automatically remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld incorporates four key stages in the generation pipeline: flexible 3D layout generation from OSM data or urban layout with semantic and height maps, urban scene design with Urban MLLM, controllable urban asset rendering via progressive 3D diffusion, and MLLM-assisted scene refinement. We conduct extensive quantitative analysis on five visual metrics, demonstrating that UrbanWorld achieves SOTA generation realism. Next, we provide qualitative results about the controllable generation capabilities of UrbanWorld using both textual and image-based prompts. Lastly, we verify the interactive nature of these environments by showcasing the agent perception and navigation within the created environments. We contribute UrbanWorld as an open-source tool available at <a class="link-external link-https" href="https://github.com/Urban-World/UrbanWorld" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic creation of realistic, customizable, and interactive 3D urban environments. Specifically, current manual design methods require a large amount of human labor, and some existing automated generation methods have the following deficiencies: 1. **Generated 3D scenes are limited to video format**: Unable to provide embodied and interactive environments. 2. **Lack of flexibility and controllability**: Unable to generate customized content according to user needs. 3. **Low geometric fidelity and texture quality**: It is difficult to achieve high - fidelity visual effects. To solve these problems, the author proposes UrbanWorld, a model that can automatically generate realistic, customizable, and embodied 3D urban environments. The main contributions of UrbanWorld include: - **Proposing a model for generating urban worlds for the first time**, which can automatically create customized, realistic, and interactive 3D urban environments. - **Generating 3D urban assets with high - quality textures through the progressive diffusion rendering method**. - **Introducing a specialized urban multi - modal large language model (Urban MLLM)**, which is used to supervise and guide the generation process to ensure that the generation results comply with user instructions. - **Providing an open - source platform** to support the creation and operation of more advanced 3D urban environments and promote the development of the broader AI community. The generation process of UrbanWorld is divided into four key stages: 1. **Flexible 3D layout generation**: Automatically generate untextured 3D layouts from OSM data or urban layouts with semantic and height maps. 2. **Urban scene design based on Urban MLLM**: Generate detailed text descriptions according to user instructions and plan urban scenes. 3. **Controllable 3D asset rendering**: Flexibly render urban assets according to text and image conditions through the progressive 3D diffusion method. 4. **MLLM - assisted scene optimization**: Use Urban MLLM to review and optimize the generated urban environment to further enhance the visual effects. These innovations enable UrbanWorld to reach new heights in generating realistic, customizable, and highly interactive 3D urban environments, providing a powerful tool for research and practical applications.