Abstract:In planning and reinforcement learning, the identification of common subgoal structures across problems is important when goals are to be achieved over long horizons. Recently, it has been shown that such structures can be expressed as feature-based rules, called sketches, over a number of classical planning domains. These sketches split problems into subproblems which then become solvable in low polynomial time by a greedy sequence of IW$(k)$ searches. Methods for learning sketches using feature pools and min-SAT solvers have been developed, yet they face two key limitations: scalability and expressivity. In this work, we address these limitations by formulating the problem of learning sketch decompositions as a deep reinforcement learning (DRL) task, where general policies are sought in a modified planning problem where the successor states of a state s are defined as those reachable from s through an IW$(k)$ search. The sketch decompositions obtained through this method are experimentally evaluated across various domains, and problems are regarded as solved by the decomposition when the goal is reached through a greedy sequence of IW$(k)$ searches. While our DRL approach for learning sketch decompositions does not yield interpretable sketches in the form of rules, we demonstrate that the resulting decompositions can often be understood in a crisp manner.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve how to identify the common sub - goal structures across problems in planning and reinforcement learning when the goal needs to be achieved through multiple actions over a long - time horizon. Specifically, it attempts to overcome two key limitations in the scalability and expressivity of existing methods. #### Background and challenges 1. **Importance of sub - goal structures**: In planning and reinforcement learning, identifying common sub - goal structures is very important for solving long - term goals. 2. **Limitations of existing methods**: - **Scalability**: Methods using feature pools and minimum SAT solvers are difficult to handle large - scale problems. - **Expressivity**: Although the feature pool enhances expressivity, it leads to a theory that is too complex to be handled by combinatorial solvers. #### Proposed solutions To overcome these limitations, the author proposes a new method that transforms the learning sketch decompositions problem into a deep reinforcement learning (DRL) task. Specifically: - **Problem definition**: Consider the task of learning sketch decompositions as learning a general policy in a modified planning problem, where the successor state of state $ s $ is defined as the state reachable from $ s $ via IW(k) search. - **Advantages of the method**: - No explicit feature pool is required. - No combinatorial solver is required. - Use a neural network classifier instead of rule - based sketches. #### Experimental verification Through experimental evaluation, the author shows the effectiveness of this method in various fields and proves that although the obtained decompositions are not represented in a rule - based form, they can usually be clearly understood. #### Main contributions 1. **Improve scalability and expressivity**: Solve the scalability and expressivity problems of existing methods through the deep reinforcement learning framework. 2. **Effectiveness of the new method**: Experiments prove that this method can effectively solve problems in multiple fields, especially for width - constrained sub - problem decompositions. 3. **Understanding in non - rule - based form**: Although the final decomposition is not rule - based, its working principle can be understood by analyzing the behavior of the neural network. ### Markdown representation of formulas - **State transition function**: $ f(s, a)=s' $ - **IW(k) algorithm**: $ N_k(s): = \{ s'|s' \text{ is reachable from } s \text{ via IW}(k)\} $ - **Policy selection**: - **Greedy method**: $ G^\pi_k(s): = \{ s'\}, \quad s'=\arg\max_{s' \in N_k(s)} \pi(s'|s) $ - **Random method**: $ G^\pi_k(s): = \{ s'\}, \quad s'\sim \pi(s'|s), \quad s' \in N_k(s) $ In this way, the author provides a novel and effective method for solving long - term planning problems while avoiding the limitations of traditional methods.

Learning Sketch Decompositions in Planning via Deep Reinforcement Learning

Expressing and Exploiting Subgoal Structure in Classical Planning Using Sketches

SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Planning-Augmented Hierarchical Reinforcement Learning

Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Sparse Graphical Memory for Robust Planning

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Plan-Space State Embeddings for Improved Reinforcement Learning

Deep Learning for Generalised Planning with Background Knowledge

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning

Learning Planning Abstractions from Language

Generalized Planning With Deep Reinforcement Learning

RLgraph: Modular Computation Graphs for Deep Reinforcement Learning

Learning First-Order Symbolic Planning Representations That Are Grounded

On the role of planning in model-based deep reinforcement learning

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Solving Challenging Control Problems Using Two-Staged Deep Reinforcement Learning

Hierarchical Decomposition and Analysis for Generalized Planning