Abstract:We consider synchronizing properties of Markov decision processes (MDP), viewed as generators of sequences of probability distributions over states. A probability distribution is p-synchronizing if the probability mass is at least p in some state, and a sequence of probability distributions is weakly p-synchronizing, or strongly p-synchronizing if respectively infinitely many, or all but finitely many distributions in the sequence are p-synchronizing. For each synchronizing mode, an MDP can be (i) sure winning if there is a strategy that produces a 1-synchronizing sequence; (ii) almost-sure winning if there is a strategy that produces a sequence that is, for all {\epsilon} > 0, a (1-{\epsilon})-synchronizing sequence; (iii) limit-sure winning if for all {\epsilon} > 0, there is a strategy that produces a (1-{\epsilon})-synchronizing sequence. For each synchronizing and winning mode, we consider the problem of deciding whether an MDP is winning, and we establish matching upper and lower complexity bounds of the problems, as well as the optimal memory requirement for winning strategies: (a) for all winning modes, we show that the problems are PSPACE-complete for weakly synchronizing, and PTIME-complete for strongly synchronizing; (b) we show that for weakly synchronizing, exponential memory is sufficient and may be necessary for sure winning, and infinite memory is necessary for almost-sure winning; for strongly synchronizing, linear-size memory is sufficient and may be necessary in all modes; (c) we show a robustness result that the almost-sure and limit-sure winning modes coincide for both weakly and strongly synchronizing.

A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs

Robust Almost-Sure Reachability in Multi-Environment MDPs

Multiple-Environment Markov Decision Processes

Markov Decision Processes with Sure Parity and Multiple Reachability Objectives

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Limit-sure reachability for small memory policies in POMDPs is NP-complete

Bi-Objective Lexicographic Optimization in Markov Decision Processes with Related Objectives

Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives

Robust Synchronization in Markov Decision Processes

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Robust Multiobjective Reinforcement Learning Considering Environmental Uncertainties

Recursively-Constrained Partially Observable Markov Decision Processes

Markov decision processes with maximum entropy rate for Surveillance Tasks

Measurement Simplification in ρ-POMDP with Performance Guarantees

Parity Objectives in Countable MDPs

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet