Abstract:This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners —This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Learning Anticipatory Decision for Distributed Systems with Robustness Guarantees

Learning Non-Markovian Decision-Making from State-only Sequences

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

Cognitive mechanisms of learning in sequential decision-making under uncertainty: an experimental and theoretical approach

Opportunistic Learning for Markov Decision Systems with Application to Smart Robots

Solving nonstationary Markov decision processes via contextual decomposition: A military air battle management application

Playing against Nature: causal discovery for decision making under uncertainty

Acting in Delayed Environments with Non-Stationary Markov Policies

Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

SPO: Sequential Monte Carlo Policy Optimisation

Adaptive network approach to exploration-exploitation trade-off in reinforcement learning

Non-Deterministic Policies in Markovian Decision Processes

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

An Auxiliary Decision-Making Method for Autonomous Driving via Monte Carlo Tree Search

Robust Anytime Learning of Markov Decision Processes

An Iterative Decision-Making Scheme for Markov Decision Processes and Its Application to Self-adaptive Systems.