Learning Sketch Decompositions in Planning via Deep Reinforcement Learning

Michael Aichmüller,Hector Geffner
2024-12-12
Abstract:In planning and reinforcement learning, the identification of common subgoal structures across problems is important when goals are to be achieved over long horizons. Recently, it has been shown that such structures can be expressed as feature-based rules, called sketches, over a number of classical planning domains. These sketches split problems into subproblems which then become solvable in low polynomial time by a greedy sequence of IW$(k)$ searches. Methods for learning sketches using feature pools and min-SAT solvers have been developed, yet they face two key limitations: scalability and expressivity. In this work, we address these limitations by formulating the problem of learning sketch decompositions as a deep reinforcement learning (DRL) task, where general policies are sought in a modified planning problem where the successor states of a state s are defined as those reachable from s through an IW$(k)$ search. The sketch decompositions obtained through this method are experimentally evaluated across various domains, and problems are regarded as solved by the decomposition when the goal is reached through a greedy sequence of IW$(k)$ searches. While our DRL approach for learning sketch decompositions does not yield interpretable sketches in the form of rules, we demonstrate that the resulting decompositions can often be understood in a crisp manner.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve how to identify the common sub - goal structures across problems in planning and reinforcement learning when the goal needs to be achieved through multiple actions over a long - time horizon. Specifically, it attempts to overcome two key limitations in the scalability and expressivity of existing methods. #### Background and challenges 1. **Importance of sub - goal structures**: In planning and reinforcement learning, identifying common sub - goal structures is very important for solving long - term goals. 2. **Limitations of existing methods**: - **Scalability**: Methods using feature pools and minimum SAT solvers are difficult to handle large - scale problems. - **Expressivity**: Although the feature pool enhances expressivity, it leads to a theory that is too complex to be handled by combinatorial solvers. #### Proposed solutions To overcome these limitations, the author proposes a new method that transforms the learning sketch decompositions problem into a deep reinforcement learning (DRL) task. Specifically: - **Problem definition**: Consider the task of learning sketch decompositions as learning a general policy in a modified planning problem, where the successor state of state \( s \) is defined as the state reachable from \( s \) via IW(k) search. - **Advantages of the method**: - No explicit feature pool is required. - No combinatorial solver is required. - Use a neural network classifier instead of rule - based sketches. #### Experimental verification Through experimental evaluation, the author shows the effectiveness of this method in various fields and proves that although the obtained decompositions are not represented in a rule - based form, they can usually be clearly understood. #### Main contributions 1. **Improve scalability and expressivity**: Solve the scalability and expressivity problems of existing methods through the deep reinforcement learning framework. 2. **Effectiveness of the new method**: Experiments prove that this method can effectively solve problems in multiple fields, especially for width - constrained sub - problem decompositions. 3. **Understanding in non - rule - based form**: Although the final decomposition is not rule - based, its working principle can be understood by analyzing the behavior of the neural network. ### Markdown representation of formulas - **State transition function**: \( f(s, a)=s' \) - **IW(k) algorithm**: \( N_k(s): = \{ s'|s' \text{ is reachable from } s \text{ via IW}(k)\} \) - **Policy selection**: - **Greedy method**: \( G^\pi_k(s): = \{ s'\}, \quad s'=\arg\max_{s' \in N_k(s)} \pi(s'|s) \) - **Random method**: \( G^\pi_k(s): = \{ s'\}, \quad s'\sim \pi(s'|s), \quad s' \in N_k(s) \) In this way, the author provides a novel and effective method for solving long - term planning problems while avoiding the limitations of traditional methods.