Position: Foundation Agents as the Paradigm Shift for Decision Making

Xiaoqian Liu,Xingzhou Lou,Jianbin Jiao,Junge Zhang
2024-05-29
Abstract:Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficiency of traditional decision - making methods in sample efficiency and generalization ability. Specifically, traditional decision - making methods such as reinforcement learning (RL), imitation learning (IL), planning and search, and optimal control require a large number of samples for training when facing new tasks and have poor generalization ability. These problems limit the effectiveness and efficiency of these methods in practical applications. In contrast, the paper proposes to construct foundation agents as a transformation of the decision - making learning paradigm. Inspired by the success of large - language models (LLMs), foundation agents aim to improve sample efficiency and generalization ability through the collection or generation of large - scale interaction data, self - supervised pre - training and adaptation, and alignment with the knowledge and values of LLMs. The paper emphasizes that foundation agents can quickly adapt to a variety of new tasks, showing the characteristics of multi - modal perception, cross - task and cross - domain adaptation, and few - shot or zero - shot generalization, especially outstanding in scenarios requiring long - term reasoning, sparse rewards, or partial observability. The paper also explores three basic characteristics of foundation agents: a unified representation of decision - making process variables, a unified policy interface across tasks and domains, and interactive decision - making in the physical and virtual worlds. These characteristics constitute the uniqueness and challenges of foundation agents, enabling them to show higher flexibility and influence in diverse decision - making situations.