Visual language navigation: a survey and open challenges

Sang-Min Park,Young-Gab Kim
DOI: https://doi.org/10.1007/s10462-022-10174-9
IF: 9.588
2022-03-25
Artificial Intelligence Review
Abstract:With the recent development of deep learning, AI models are widely used in various domains. AI models show good performance for definite tasks such as image classification and text generation. With the recent development of generative models (e.g., BigGAN, GPT-3), AI models also show impressive results for diverse generation tasks (e.g., photo-realistic image, paragraph generation). As the performance of each AI model improves, interest in comprehensive tasks, such as visual language navigation (VLN) which follows the language instruction with an egocentric view, is also growing. However, the model integration for VLN has a problem due to the model complexity, modal heterogeneity, and paired data shortage. This study provides a comprehensive survey on VLN with a systemic approach for reviewing recent trends. At first, we define a taxonomy for fundamental techniques which need to perform VLN. We analyze from four perspectives of VLN: representation learning, reinforcement learning, component, and evaluation. We investigate the pros and cons of each component and methodology that have been conducted recently. This survey categorizes major research institute's approaches with taxonomy defined in four perspectives, unlike other conventional surveys. Finally, we discuss current open challenges and conclude our study by giving possible future directions.
computer science, artificial intelligence
What problem does this paper attempt to address?