Abstract:We present an online planning framework for solving multi-object rearrangement problems in partially observable, multi-room environments. Current object rearrangement solutions, primarily based on Reinforcement Learning or hand-coded planning methods, often lack adaptability to diverse challenges. To address this limitation, we introduce a novel Hierarchical Object-Oriented Partially Observed Markov Decision Process (HOO-POMDP) planning approach. This approach comprises of (a) an object-oriented POMDP planner generating sub-goals, (b) a set of low-level policies for sub-goal achievement, and (c) an abstraction system converting the continuous low-level world into a representation suitable for abstract planning. We evaluate our system on varying numbers of objects, rooms, and problem types in AI2-THOR simulated environments with promising results.
What problem does this paper attempt to address?
This paper attempts to solve the multi - object rearrangement problem, especially complex tasks in partially observable multi - room environments. Specifically, the article points out that current object rearrangement solutions (mainly based on reinforcement learning or hand - coded planning methods) generally lack the ability to adapt to various challenges. To address this limitation, the authors introduce a novel Hierarchical Object - Oriented Partially Observable Markov Decision Process (HOO - POMDP) planning method.
### Problem Background
In real - life home environments, multi - object rearrangement is a fundamental challenge, involving complex perception, planning, navigation, and manipulation tasks. In multi - room settings, this problem becomes more difficult because most of the environment is invisible at any given time. These scenarios are very common in daily life, such as tidying up the home or organizing groceries, and are therefore crucial for the development of next - generation home - assistive robots.
### Limitations of Existing Methods
Existing multi - object rearrangement methods are mainly divided into two categories:
1. **Reinforcement Learning (RL) Methods**: As problems become more complex and lengthy, RL methods often struggle to scale to more challenging scenarios.
2. **Hand - Coded Planning Systems**: These methods usually presuppose the order of skill application or use greedy planners, limiting their potential in determining the optimal interaction sequence and handling new problems (such as blocked paths or occluded target locations).
### The Method Proposed in the Paper
To solve the above problems, the authors propose the HOO - POMDP planning framework, which mainly includes the following components:
- **Object - Oriented POMDP Planner**: Generates sub - goals.
- **Set of Low - Level Policies**: Used to achieve sub - goals.
- **Abstraction System**: Transforms the continuous low - level world into a representation suitable for abstract planning.
### Main Contributions
The main contributions of the paper include:
1. **Modular Planning System**: Comprising an object - oriented planner and a state - abstraction module, suitable for object rearrangement in multi - room environments.
2. **New Dataset**: Contains blocked - path problems and extended room configurations, as well as existing rearrangement challenges.
3. **Empirical Evaluation**: Evaluates the system under different conditions in the AI2Thor simulation environment.
Through this method, the system can efficiently handle complex rearrangement tasks in partially observable multi - room environments and can adapt to new problems, such as blocked paths or occluded target locations.
### Formula Representation
The formulas involved in the paper are represented in Markdown format as follows:
- POMDP Definition:
\[
\text{POMDP}=(S, A, T, R, \gamma, O, O_{\text{model}})
\]
- State Space \(S\)
- Action Space \(A\)
- Transition Function \(T(s, a, s') = p(s'|s, a)\)
- Observation Model \(O_{\text{model}}(s, a, z)=p(z|s, a)\)
- Reward Function \(R(s, a)\)
- Discount Factor \(\gamma\)
- Belief Update Formula:
\[
b'(s')=\eta O(s', a, z)\sum_{s\in S}T(s, a, s')b(s)
\]
Through these improvements, the HOO - POMDP framework can handle multi - object rearrangement tasks more effectively and perform well in partially observable multi - room environments.