State Aggregation In Markov Decision Processes

zhiyuan ren,bruce h krogh
DOI: https://doi.org/10.1109/CDC.2002.1184960
2002-01-01
Abstract:We study state aggregation for Markov decision processes (MDPs) with long-run average-cost optimality criterion in this paper. The aggregation is based on a definition of an (epsilon(p), epsilon(f))-lumpable partition of the state space, where the difference between the control effect of any control action on any two states belonging to the same subset in the partition is bounded by epsilon(p) for the state-transition effect and ef for the cost effect. The states in the same partition subset are treated into one meta-state to obtain an aggregated Markov chain. We then construct an aggregated MDP with average cost on the aggregated Markov chain. We develop an algorithm to find the solution to this aggregated MDP problem and show its performance is within some o(epsilon(p), epsilon(f)) neighborhood of the optimal solution to the original MDP problem. In real applications, the (epsilon(p), epsilon(f))-lumpable partition is usually obtained empirically. However, we also study the problem of looking for the coarsest (epsilon(p), epsilon(f))-lumpable partition, i.e., the partition with minimum number of subsets, given ep and epsilon(f). We prove that this partitioning problem is in the time complexity class of P-hard, which is not easier than the original MDP problem in the class of P-complete.
What problem does this paper attempt to address?