On solving optimal policies for event-based dynamic programming

Jia Qing-Shan
2010-01-01
Abstract:Markov decision processes (MDPs) have provided general frameworks for many control, decision making, and optimization problems. However, solving the optimal policies for many such problems is computationally prohibitive due to the large state space and the large action space. Event-based dynamic programming (EDP) has been developed to formulate the event-based decision making processes. Since the number of events could increase only linearly with respect to (w.r.t.) the problem scale, EDP provides a computationally feasible way to many problems which are time-consuming to solve in the MDP framework. However, the event sequence is not Markov, the optimal event-based policy could depend on the entire history, which cannot be implemented in practice. In this paper, for EDP with discrete and finite state space we construct a completely observable MDP with both the belief distribution over the internal system state and the current observable event being the state. Then we show that solving the original EDP is equivalent to solving this belief-event dynamic programming (BEDP), the optimal policies of which can be found within Markov policies that can be implemented in practice. Then potential-based policy iteration algorithms for completely observable MDP can be applied. We also discuss extensions to finite-stage EDP.
What problem does this paper attempt to address?