State Entropy Optimization in Markov Decision Processes

Shuai Ma,Li Xia,Qianchuan Zhao
DOI: https://doi.org/10.1109/case59546.2024.10711499
2024-01-01
Abstract:We examine the state entropy optimization in both discounted and average Markov decision processes (MDPs). We suggest a total entropy optimization in a discounted setting, and solve both the entropy rate optimization and the total discounted entropy optimization with iterative algorithms. An optimal solution to entropy maximization ensures that the system remains as unpredictable as possible. Previous works apply nonlinear programming methods to either the total entropy or entropy rate optimizations. We present both value iteration and policy iteration for synthesizing entropy optimizing policies in ergodic MDPs. For each state in each iteration, the action distribution is optimized with convex optimization in entropy maximization problems. We illustrate the validity of the proposed algorithms in a numerical experiment.
What problem does this paper attempt to address?