Abstract:Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficiency of traditional decision - making methods in sample efficiency and generalization ability. Specifically, traditional decision - making methods such as reinforcement learning (RL), imitation learning (IL), planning and search, and optimal control require a large number of samples for training when facing new tasks and have poor generalization ability. These problems limit the effectiveness and efficiency of these methods in practical applications. In contrast, the paper proposes to construct foundation agents as a transformation of the decision - making learning paradigm. Inspired by the success of large - language models (LLMs), foundation agents aim to improve sample efficiency and generalization ability through the collection or generation of large - scale interaction data, self - supervised pre - training and adaptation, and alignment with the knowledge and values of LLMs. The paper emphasizes that foundation agents can quickly adapt to a variety of new tasks, showing the characteristics of multi - modal perception, cross - task and cross - domain adaptation, and few - shot or zero - shot generalization, especially outstanding in scenarios requiring long - term reasoning, sparse rewards, or partial observability. The paper also explores three basic characteristics of foundation agents: a unified representation of decision - making process variables, a unified policy interface across tasks and domains, and interactive decision - making in the physical and virtual worlds. These characteristics constitute the uniqueness and challenges of foundation agents, enabling them to show higher flexibility and influence in diverse decision - making situations.

Position: Foundation Agents as the Paradigm Shift for Decision Making

Foundation Models for Decision Making: Problems, Methods, and Opportunities

Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning

Robot Learning in the Era of Foundation Models: A Survey

Large-scale Foundation Models and Generative AI for BigData Neuroscience

Foundation Models for Education: Promises and Prospects

Building Decision Making Models Through Language Model Regime

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

Foundation models in brief: A historical, socio-technical focus

A Survey for Foundation Models in Autonomous Driving

Foundation Models Meet Visualizations: Challenges and Opportunities

Applications of Large Scale Foundation Models for Autonomous Driving

GUI Agents with Foundation Models: A Comprehensive Survey

An Interactive Agent Foundation Model

CHORUS: Foundation Models for Unified Data Discovery and Exploration

Position Paper: Agent AI Towards a Holistic Intelligence

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

ExpeL: LLM Agents Are Experiential Learners