On Value Iteration Convergence in Connected MDPs

Arsenii Mustafin,Alex Olshevsky,Ioannis Ch. Paschalidis
2024-06-14
Abstract:This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor {\gamma} for both discounted and average-reward criteria.
Machine Learning
What problem does this paper attempt to address?