ALGPT: Multi-Agent Cooperative Framework for Open-Vocabulary Multi-Modal Auto-Annotating in Autonomous Driving

Yijie Zhou,Xianhui Cheng,Qiming Zhang,Lei Wang,Wenchao Ding,Xiangyang Xue,Chunbo Luo,Jian Pu
DOI: https://doi.org/10.1109/tiv.2024.3461651
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Large Language Models (LLMs) have achieved impressive progress in decision-making and task automation for intelligent agents. However, multiple agents must cooperate to complete tasks in complex real-world applications, such as auto-annotating in autonomous driving. The primary challenges lie in how multiple agents effectively communicate and collaborate in a multi-modal environment and how to automatically refine annotating results to reduce human intervention. These challenges also hinder LLMs from fully evolving into embodied intelligent agents. Driven by these motivations, we propose ALGPT, a multi-agent cooperative framework for open-vocabulary multi-modal auto-annotation in autonomous driving. ALGPT dynamically assembles agent teams with different roles, and agents cooperate to complete annotation tasks according to requirements. By leveraging Chain of Thought (CoT) and In-Context Learning (ICL) techniques, ALGPT's reasoning capabilities are enhanced, allowing it to develop suitable plans autonomously without human intervention. Furthermore, drawing from project management standards, we introduce project management documents and Standard Operating Procedures (SOPs), which further align ALGPT's behavior with human expectations and mitigate the impact of GPT illusions caused by the cascading effects of multiple GPTs. The source code will be released at https://github.com/Fudan-ProjectTitan/OpenAnnotate.
What problem does this paper attempt to address?