Demo Abstract: Embodied Aerial Agent for City-level Visual Language Navigation Using Large Language Model

Weichen Zhang,Yuxuan Liu,Xuzhe Wang,Xuecheng Chen,Chen Gao,Xinlei Chen
DOI: https://doi.org/10.1109/ipsn61024.2024.00033
2024-01-01
Abstract:As unmanned aerial vehicles (UAVs) become more prevalent in smart cities, their capacity for visual language navigation (VLN) is garnering increasing interest. VLN in cities has significant applications in delivery, rescue, and security patrol, among other fields. One of the most representative tasks is to navigate to specific locations following the language instructions. While some current methods have achieved notable results in indoor settings, challenges persist outdoors, including agents’ inaccurate spatial understanding and ambiguous language instructions. In this work, we explore an embodied navigation agent design, in which a fine-grained spatial verbalizer and a history path memory are proposed to guarantee accurate VLN in open 3D urban environments.
What problem does this paper attempt to address?