Abstract:Embodied artificial intelligence emphasizes the role of an agent's body in generating human-like behaviors. The recent efforts on EmbodiedAI pay a lot of attention to building up machine learning models to possess perceiving, planning, and acting abilities, thereby enabling real-time interaction with the world. However, most works focus on bounded indoor environments, such as navigation in a room or manipulating a device, with limited exploration of embodying the agents in open-world scenarios. That is, embodied intelligence in the open and outdoor environment is less explored, for which one potential reason is the lack of high-quality simulators, benchmarks, and datasets. To address it, in this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments. Specifically, we first construct a highly realistic 3D simulation environment based on the real buildings, roads, and other elements in a real city. In this environment, we combine historically collected data and simulation algorithms to conduct simulations of pedestrian and vehicle flows with high fidelity. Further, we designed a set of evaluation tasks covering different EmbodiedAI abilities. Moreover, we provide a complete set of input and output interfaces for access, enabling embodied agents to easily take task requirements and current environmental observations as input and then make decisions and obtain performance evaluations. On the one hand, it expands the capability of existing embodied intelligence to higher levels. On the other hand, it has a higher practical value in the real world and can support more potential applications for artificial general intelligence. Based on this platform, we evaluate some popular large language models for embodied intelligence capabilities of different dimensions and difficulties.

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Learning through Dialogue Interactions by Asking Questions

Dialogue Learning with Human-in-the-Loop.

Simulating User Agents for Embodied Conversational-AI

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

TEACh: Task-Driven Embodied Agents That Chat

DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion

Learning by Asking for Embodied Visual Navigation and Task Completion

DANLI: Deliberative Agent for Following Natural Language Instructions

Manual-Guided Dialogue for Flexible Conversational Agents

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

DialSim: A Real-Time Simulator for Evaluating Long-Term Multi-Party Dialogue Understanding of Conversational Agents

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production

Asking Before Action: Gather Information in Embodied Decision Making with Language Models

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment