LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Jijia Liu,Chao Yu,Jiaxuan Gao,Yuqing Xie,Qingmin Liao,Yi Wu,Yu Wang
DOI: https://doi.org/10.5555/3635637.3662979
2024-01-01
Abstract:AI agents powered by Large Language Models (LLMs) have made significantadvances, enabling them to assist humans in diverse complex tasks and leadingto a revolution in human-AI coordination. LLM-powered agents typically requireinvoking LLM APIs and employing artificially designed complex prompts, whichresults in high inference latency. While this paradigm works well in scenarioswith minimal interactive demands, such as code generation, it is unsuitable forhighly interactive and real-time applications, such as gaming. Traditionalgaming AI often employs small models or reactive policies, enabling fastinference but offering limited task completion and interaction abilities. Inthis work, we consider Overcooked as our testbed where players couldcommunicate with natural language and cooperate to serve orders. We propose aHierarchical Language Agent (HLA) for human-AI coordination that provides bothstrong reasoning abilities while keeping real-time execution. In particular,HLA adopts a hierarchical framework and comprises three modules: a proficientLLM, referred to as Slow Mind, for intention reasoning and languageinteraction, a lightweight LLM, referred to as Fast Mind, for generating macroactions, and a reactive policy, referred to as Executor, for transforming macroactions into atomic actions. Human studies show that HLA outperforms otherbaseline agents, including slow-mind-only agents and fast-mind-only agents,with stronger cooperation abilities, faster responses, and more consistentlanguage communications.
What problem does this paper attempt to address?