Understanding World or Predicting Future? A Comprehensive Survey of World Models

Jingtao Ding,Yunke Zhang,Yu Shang,Yuheng Zhang,Zefang Zong,Jie Feng,Yuan Yuan,Hongyuan Su,Nian Li,Nicholas Sukiennik,Fengli Xu,Yong Li
2024-11-21
Abstract:The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to define and understand the concept of "World Models" and conduct a comprehensive review of its applications in different fields. Specifically, the paper focuses on two main aspects: 1. **Mechanisms for constructing internal representations to understand the world**: - The paper explores how to understand the operating mechanisms of the external world by constructing internal representations. This involves transforming external reality into latent variables in the model, thereby forming an implicit understanding of environmental changes. For example, in decision - making tasks, world models can help agents perform hypothetical actions without actually affecting the real environment, reducing the cost of trial and error. 2. **Predicting future states to simulate and guide decision - making**: - The paper also studies how to use world models to predict future states in order to better simulate and guide decision - making. This includes the application of generative models, such as video - generation models (e.g., Sora), which can generate realistic videos to simulate future world dynamics. In addition, the paper also discusses the transformation from visual to spatial representations and the generation from videos to embodied environments to achieve more realistic physical - world simulations. ### Specific Problems and Solutions - **Defining World Models**: - The author proposes a new classification system, dividing world models into two categories: constructing internal representations and predicting future states. This classification helps to systematically understand the different functions and application scenarios of world models. - **Technical Progress**: - The paper reviews in detail the current technical progress in constructing internal representations and predicting future states. For example, in methods such as model predictive control (MPC) and Monte Carlo tree search (MCTS), how to use world models to optimize the decision - making process. - **Application Areas**: - The paper explores the applications of world models in key areas such as autonomous driving, robotics, and virtual social systems. Each area has different requirements. For example, autonomous driving needs to perceive road conditions in real - time and predict their evolution, while robotics requires a precise understanding of external dynamics and the generation of interactive environments. - **Future Research Directions**: - The paper also points out the directions and trends of future research, emphasizing the potential of world models in adapting to a wider range of practical applications. For example, how to further improve the performance of world models in complex environments and how to fuse multi - modal data into world models. ### Conclusion Through systematic classification and review, this paper aims to provide a comprehensive perspective to help readers understand the definition, functions, and applications of world models in different fields. At the same time, the paper also provides valuable references and directions for future research.