Contingency Planning Using Bi-level Markov Decision Processes for Space Missions

Somrita Banerjee,Edward Balaban,Mark Shirley,Kevin Bradner,Marco Pavone
2024-02-26
Abstract:This work focuses on autonomous contingency planning for scientific missions by enabling rapid policy computation from any off-nominal point in the state space in the event of a delay or deviation from the nominal mission plan. Successful contingency planning involves managing risks and rewards, often probabilistically associated with actions, in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model decision-making in such scenarios. However, in the specific case of planetary rover traverse planning, the vast action space and long planning time horizon pose computational challenges. A bi-level MDP framework is proposed to improve computational tractability, while also aligning with existing mission planning practices and enhancing explainability and trustworthiness of AI-driven solutions. We discuss the conversion of a mission planning MDP into a bi-level MDP, and test the framework on RoverGridWorld, a modified GridWorld environment for rover mission planning. We demonstrate the computational tractability and near-optimal policies achievable with the bi-level MDP approach, highlighting the trade-offs between compute time and policy optimality as the problem's complexity grows. This work facilitates more efficient and flexible contingency planning in the context of scientific missions.
Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve autonomous emergency planning in scientific tasks. In particular, when planetary probes encounter delays or deviate from the normal mission plan, they can quickly calculate action strategies from any abnormal state point. Successful emergency planning requires managing risks and rewards, which are usually associated with the probabilities of actions, especially in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model this decision - making process. However, in the specific case of planetary probe path planning, the large action space and long - term planning time horizon pose computational challenges. For this reason, the paper proposes a two - level MDP framework, aiming to improve computational feasibility, be consistent with existing mission planning practices, and enhance the interpretability and credibility of AI solutions. Specifically, the paper addresses the above problems through the following points: 1. **Introducing the two - level MDP framework**: To deal with the computational complexity of traditional single - level MDPs in handling large - scale state spaces and long - term planning, the paper proposes a two - level MDP framework. This framework divides the decision - making process into a strategic level and a tactical level. The strategic level is responsible for determining the next goal, while the tactical level is responsible for specific path planning and action execution. 2. **Improving computational efficiency**: By decomposing the MDP into two levels, the state space and action space at each level are significantly reduced, thereby improving computational efficiency. The paper shows that this method can not only generate near - optimal strategies in a relatively short time but also maintain a high reward value. 3. **Enhancing interpretability and credibility**: The two - level structure allows the introduction of human preferences, making the decision - making process more transparent and traceable, thereby increasing the acceptance and credibility of AI solutions. The paper verifies the advantages of the two - level MDP framework in terms of computational efficiency and strategy quality by testing it in the RoverGridWorld environment, especially when the task complexity increases, it can effectively balance the relationship between computational time and strategy optimality.