Abstract:We consider two-player stochastic games played on finite graphs with reachability objectives where the first player tries to ensure a target state to be visited almost-surely (i.e., with probability 1), or positively (i.e., with positive probability), no matter the strategy of the second player. We classify such games according to the information and the power of randomization available to the players. On the basis of information, the game can be one-sided with either (a) player 1, or (b) player 2 having partial observation (and the other player has perfect observation), or two-sided with (c) both players having partial observation. On the basis of randomization, the players (a) may not be allowed to use randomization (pure strategies), or (b) may choose a probability distribution over actions but the actual random choice is external and not visible to the player (actions invisible), or (c) may use full randomization. Our main results for pure strategies are as follows. (1) For one-sided games with player 1 having partial observation we show that (in contrast to full randomized strategies) belief-based (subset-construction based) strategies are not sufficient, and we present an exponential upper bound on memory both for almostsure and positive winning strategies; we show that the problem of deciding the existence of almost-sure and positive winning strategies for player 1 is EXPTIME-complete. (2) For one-sided games with player 2 having partial observation we show that non-elementary memory is both necessary and sufficient for both almost-sure and positive winning strategies. (3) We show that for the general (two-sided) case finite-memory strategies are sufficient for both positive and almost-sure winning, and at least non-elementary memory is required. We establish the equivalence of the almost-sure winning problems for pure strategies and for randomized strategies with actions invisible. Our equivalence result exhibits serious flaws in previous results of the literature: we show a non-elementary memory lower bound for almost-sure winning whereas an exponential upper bound was previously claimed.

Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games

nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

$ε$-Optimally Solving Zero-Sum POSGs

Search Games with Predictions

Partial-Observation Stochastic Games: How to Win When Belief Fails

A Probabilistic Greedy Search Value Iteration Algorithm For Pomdp

Heuristic Search for Linear Positive Systems

Heuristics for Partially Observable Stochastic Contingent Planning

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

Popvi: A Probability-Based Optimal Policy Value Iteration Algorithm

Stochastic Dynamic Games in Belief Space

Universal Complexity Bounds Based on Value Iteration for Stochastic Mean Payoff Games and Entropy Games

A Hybrid Heuristic Value Iteration Algorithm for Pomdp

Exploration Analysis in Finite-Horizon Turn-based Stochastic Games.

PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces.

A Probabilistic Forward Search Value Iteration Algorithm for POMDP

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Continual Depth-limited Responses for Computing Counter-strategies in Sequential Games