MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu,Honglin He,Jack He,Yiran Wang,Chenda Duan,Zhizheng Liu,Quanyi Li,Bolei Zhou
2024-10-11
Abstract:Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while robot dogs and humanoids have recently emerged in the street. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. Ensuring the generalizability and safety of AI models maneuvering mobile machines is essential. In this work, we present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for urban micromobility research and establish various baselines of Reinforcement Learning and Imitation Learning. We conduct extensive evaluation across mobile machines, demonstrating that heterogeneous mechanical structures significantly influence the learning and execution of AI policies. We perform a thorough ablation study, showing that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide research opportunities and foster safe and trustworthy embodied AI and micromobility in cities. The code and dataset will be publicly available.
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to provide a general and safe AI simulation platform for urban micromobility to support AI - driven urban micromobility research. Specifically, MetaUrban aims to ensure the generalization ability and safety of AI models when operating various mobile machines by constructing an infinite number of interactive urban scenarios, covering diverse ground layouts, obstacle placements, pedestrian dynamics and the behaviors of other mobile agents. ### Problem Background In recent years, with the development of robotics and AI, public urban spaces (such as streets, squares, etc.) are no longer just places for human activities, and more and more automated devices have begun to share these spaces. For example, food - delivery robots, electric wheelchairs, robot dogs and humanoid robots have begun to appear on the city streets. Micromobility, that is, using small and lightweight vehicles (such as electric scooters, electric bicycles, etc.) for short - distance travel, has become increasingly important in improving urban traffic efficiency, reducing environmental impacts and providing flexible transportation options. However, the existing simulation platforms mainly focus on indoor home environments or outdoor driving environments, and there is relatively little research on complex urban micromobility tasks, especially those involving diverse layouts, terrains, obstacles and pedestrian dynamics. This has led to great challenges in deploying AI - driven micromobility devices in the real world, especially in terms of generalization ability and safety. ### MetaUrban's Solution To address the above challenges, the paper proposes MetaUrban - a combined simulation platform specifically designed for AI - driven urban micromobility research. The main features of MetaUrban include: 1. **Hierarchical Layout Generation**: - It can generate an infinitely diverse range of urban scene layouts, including the division of different functional areas, object positions and terrains. - It supports the generation of large - scale scenes and can simulate complex ground conditions and diverse object distributions. 2. **Scalable Obstacle Retrieval**: - It utilizes global urban scene data to extract the object distributions in the real world and obtains high - quality static object sets from 3D asset libraries through an open - vocabulary search method based on visual - language models (VLM). - It ensures that the trained AI agents have better generalization ability when facing diverse objects in the real world. 3. **Cohabitant Populating**: - By introducing 1,100 3D human models with 2,314 kinds of movements, as well as other types of mobile machines (such as delivery robots, electric wheelchairs, mobile scooters, robot dogs and humanoid robots), it creates a vivid urban environment. - It simulates the dynamics of complex pedestrians and vulnerable road users (VRUs) to ensure the safety and social compatibility of mobile agents. ### Experiments and Evaluation Based on MetaUrban, the researchers constructed a large - scale dataset named MetaUrban - 12K, which contains 12,800 training scenarios and 1,000 test scenarios. Each scenario has an average area of 20,000 square meters and contains rich static objects and dynamic agents. The experiments mainly focus on two core tasks: 1. **Point Navigation**: - The goal is to let the mobile agent reach the specified target position from the starting point. - The evaluation metrics include the Success Rate (SR) and the Success weighted by Path Length (SPL). 2. **Social Navigation**: - The goal is to let the mobile agent interact with other pedestrians and agents while avoiding collisions. - In addition to the Success Rate (SR), the Social Navigation Score (SNS) is also used to evaluate the social compatibility of the agent. In addition, the researchers also evaluated the influence of different mechanical structures (such as engine force, wheel friction and wheelbase) on learning and execution strategies, showing...