Abstract:Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. The emerging field of Agent AI spans a wide range of existing embodied and agent-based multimodal interactions, including robotics, gaming, and healthcare systems, etc. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model. On top of this idea, we discuss how agent AI exhibits remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Furthermore, we discuss the potential of Agent AI from an interdisciplinary perspective, underscoring AI cognition and consciousness within scientific discourse. We believe that those discussions serve as a basis for future research directions and encourage broader societal engagement.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the blurring of overall goals caused by excessive fragmentation in current artificial intelligence research, and how to achieve a more comprehensive artificial intelligence system through the integration of multimodal data and cross - domain applications. Specifically, the paper emphasizes the importance of developing **Agent AI** - a system capable of integrating large - scale basic models (such as large - language models and visual - language models) into agent actions. The goal of Agent AI is to achieve a higher - level intelligent behavior by combining functions such as learning, memory, action, perception, planning, and cognition, so as to perform tasks in the physical and virtual worlds. The paper proposes a new large - scale action model - Agent Foundation Model, aiming to support the realization of this comprehensive intelligence, and explores the application potential of Agent AI in multiple fields such as robotics, games, and healthcare, as well as its potential impact on artificial intelligence cognition and consciousness. The main contributions of the paper are as follows: 1. **Proposing a new paradigm of Agent AI**: Emphasizing the importance of achieving a more comprehensive artificial intelligence system through the integration of multimodal data and cross - domain applications. 2. **Agent Foundation Model**: Introducing a new large - scale action model aimed at supporting the realization of comprehensive intelligence. 3. **Interdisciplinary discussion**: Discussing the potential of Agent AI from the perspectives of multiple disciplines such as neuroscience, biology, physics, biophysics, cognitive science, medical health, and moral philosophy. 4. **Future research directions**: Exploring the development directions of Agent AI, including issues such as ethical challenges that need to be addressed. Through these discussions, the paper aims to illustrate how the development of these technologies makes AI agents closer to achieving artificial general intelligence (AGI) and holistic intelligence (HI).

Position Paper: Agent AI Towards a Holistic Intelligence

Agent AI: Surveying the Horizons of Multimodal Interaction

An Interactive Agent Foundation Model

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

A Survey on Robotics with Foundation Models: toward Embodied AI

The Rise and Potential of Large Language Model Based Agents: A Survey

The Journey/DAO/TAO of Embodied Intelligence: From Large Models to Foundation Intelligence and Parallel Intelligence

Large Model Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends

Position: Towards Unified Alignment Between Agents, Humans, and Environment

A call for embodied AI

Position: Foundation Agents as the Paradigm Shift for Decision Making

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents

AUTONOMOUS AGENTS AS EMBODIED AI

Towards Foundation-model-based Multiagent System to Accelerate AI for Social Impact

Body of Her: A Preliminary Study on End-to-End Humanoid Agent

A research of artificial intelligence game agent application

The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence