Abstract:Recent learning-to-imitation methods have shown promising results in planning via imitating within the observation-action space. However, their ability in open environments remains constrained, particularly in long-horizon tasks. In contrast, traditional symbolic planning excels in long-horizon tasks through logical reasoning over human-defined symbolic spaces but struggles to handle observations beyond symbolic states, such as high-dimensional visual inputs encountered in real-world scenarios. In this work, we draw inspiration from abductive learning and introduce a novel framework \textbf{AB}ductive \textbf{I}mitation \textbf{L}earning (ABIL) that integrates the benefits of data-driven learning and symbolic-based reasoning, enabling long-horizon planning. Specifically, we employ abductive reasoning to understand the demonstrations in symbolic space and design the principles of sequential consistency to resolve the conflicts between perception and reasoning. ABIL generates predicate candidates to facilitate the perception from raw observations to symbolic space without laborious predicate annotations, providing a groundwork for symbolic planning. With the symbolic understanding, we further develop a policy ensemble whose base policies are built with different logical objectives and managed through symbolic reasoning. Experiments show that our proposal successfully understands the observations with the task-relevant symbolics to assist the imitation learning. Importantly, ABIL demonstrates significantly improved data efficiency and generalization across various long-horizon tasks, highlighting it as a promising solution for long-horizon planning. Project website: \url{<a class="link-external link-https" href="https://www.lamda.nju.edu.cn/shaojj/KDD25_ABIL/" rel="external noopener nofollow">this https URL</a>}.

Scoring-Aggregating-Planning: Learning Task-Agnostic Priors from Interactions and Sparse Rewards for Zero-Shot Generalization

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation.

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

Learning adaptive planning representations with natural language guidance

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Explicit-Implicit Subgoal Planning for Long-Horizon Tasks with Sparse Reward

Hierarchical Planning and Learning for Robots in Stochastic Settings Using Zero-Shot Option Invention

Sparse Graphical Memory for Robust Planning

Learning and Planning with a Semantic Model

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Conditional Predictive Behavior Planning with Inverse Reinforcement Learning for Human-like Autonomous Driving

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation

Skill Induction and Planning with Latent Language

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference

Human-AI Coordination via Human-Regularized Search and Learning

Intent-aware Multi-agent Reinforcement Learning