Abstract:Robot foundation models, particularly Vision-Language-Action (VLA) models, have garnered significant attention for their ability to enhance robot policy learning, greatly improving robot generalization and robustness. OpenAI recent model, o1, showcased impressive capabilities in solving complex problems by utilizing extensive reasoning chains. This prompts an important question: can robot models achieve better performance in multi-task, complex environments by reviewing prior observations and then providing task-specific reasoning to guide action prediction? In this paper, we introduce \textbf{Chain-of-Affordance (CoA)}, a novel approach to scaling robot models by incorporating reasoning in the format of sequential robot affordances to facilitate task completion. Specifically, we prompt the model to consider the following four types of affordances before taking action: a) object affordance - what object to manipulate and where it is; b) grasp affordance - the specific object part to grasp; c) spatial affordance - the optimal space to place the object; and d) movement affordance - the collision-free path for movement. By integrating this knowledge into the policy model, the robot gains essential context, allowing it to act with increased precision and robustness during inference. Our experiments demonstrate that CoA achieves superior performance than state-of-the-art robot foundation models, such as OpenVLA and Octo. Additionally, CoA shows strong generalization to unseen object poses, identifies free space, and avoids obstacles in novel environments.

Affordance model for cognitive robotics based on analysis functions

A Novel Formalization for Robot Cognition Based on Affordance Model

Learning Object Affordance with Contact and Grasp Generation

Affordance Discovery Based On Intrinsic Motivation In Robots

Affordance Learning And Inference Based On Vision-Speech Association In Human-Robot Interactions

Affordance Research in Developmental Robotics: A Survey.

Robot′s affordance prediction based on the subtask

Affordance Triggering for Arbitrary States Based on Robot Exploring

Learning Interactive Affordance for Human-Robot Interaction

Affordance Learning Based on Subtask'S Optimal Strategy

Object-Object Interaction Affordance Learning

Hierarchical Affordance Discovery using Intrinsic Motivation

High-level Object Affordance Recognition.

Rediscovering Affordance: A Reinforcement Learning Perspective

Utilization of Affordance by Reinforcement Learning Robot

Learning Social Affordance for Human-Robot Interaction

RAIL: Robot Affordance Imagination with Large Language Models

Task-Oriented Robot Cognitive Manipulation Planning Using Affordance Segmentation and Logic Reasoning.

Improving Vision-Language-Action Models via Chain-of-Affordance

Building an Affordances Map With Interactive Perception

Is That a Chair? Imagining Affordances Using Simulations of an Articulated Human Body