Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Zheng Zhu,Xiaofeng Wang,Wangbo Zhao,Chen Min,Nianchen Deng,Min Dou,Yuqi Wang,Botian Shi,Kai Wang,Chi Zhang,Yang You,Zhaoxiang Zhang,Dawei Zhao,Liang Xiao,Jian Zhao,Jiwen Lu,Guan Huang
2024-05-06
Abstract:General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical laws. In this survey, we embark on a comprehensive exploration of the latest advancements in world models. Our analysis navigates through the forefront of generative methodologies in video generation, where world models stand as pivotal constructs facilitating the synthesis of highly realistic visual content. Additionally, we scrutinize the burgeoning field of autonomous-driving world models, meticulously delineating their indispensable role in reshaping transportation and urban mobility. Furthermore, we delve into the intricacies inherent in world models deployed within autonomous agents, shedding light on their profound significance in enabling intelligent interactions within dynamic environmental contexts. At last, we examine challenges and limitations of world models, and discuss their potential future directions. We hope this survey can serve as a foundational reference for the research community and inspire continued innovation. This survey will be regularly updated at:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main objective of this paper is to explore and summarize the latest advancements and potential applications of General World Models in the field of artificial intelligence. Specifically: 1. **Understanding and Simulating Physical Laws**: Through generative techniques, particularly the Sora model, it demonstrates its ability to understand and simulate physical laws, marking a significant advancement in General World Models. 2. **Video Generation**: Investigating the application of video generation world models in generating and editing videos, these models can understand and simulate complex scenes, aiding in media production and artistic expression. 3. **Autonomous Driving**: Exploring how autonomous driving world models utilize video generation technology to create driving scenarios and learn driving strategies from driving videos, thereby enhancing the safety and efficiency of autonomous driving. 4. **Autonomous Agents**: Analyzing the role of world models in game agents and robotic systems, especially in intelligent interaction within dynamic environments, improving the learning efficiency and generalization capabilities of agents. 5. **Challenges and Future Directions**: Identifying the current challenges and limitations faced by world models and proposing future research directions aimed at advancing this field and promoting the achievement of Artificial General Intelligence (AGI). In summary, this paper aims to provide a comprehensive overview of the current state of research on world models and their potential in various application scenarios, offering a reference for researchers in academia and industry, and inspiring more innovative thinking.