Abstract:Event-based optimization (EBO) provides a unified framework for problems in which decisions can be made only when certain events occur. Because the event sequence usually is not Markovian, the optimal policy could depend on the entire event history, which is hard to implement in practice. So most existing studies focus on memoryless policies, which make decisions only based on the current observable events. But it remains open how to find the optimal memoryless policies in general, leaving alone to solve the EBO optimally. In this technical note, we address these two important questions for infinite-stage EBOs with finite state and action spaces and make the following three major contributions. First, we extend our previous studies on finite-stage EBOs and convert infinite-stage EBOs to partially observable Markov decision processes (POMDPs). The belief process of this POMDP is called belief-event decision process (BEDP). Under certain well-known conditions, the optimal policies of BEDPs can be achieved within stationary Markov deterministic policies. Second, assuming optimal stationary policies exist, the performance difference and derivative formulas are developed. Potentials of memoryless event-based policies are shown to be piecewise linear functions, and thus can be efficiently estimated through sample paths. Third, a potential-based approximate policy iteration algorithm is developed to obtain near-optimal memoryless policies. The convergence and performance loss bound of this algorithm are analyzed.

Finding Optimal Observation-Based Policies for Constrained POMDPs under the Expected Average Reward Criterion

Finding Optimal Memoryless Policies of POMDPs under the Expected Average Reward Criterion

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Centralized Optimization for Dec-POMDPs under the Expected Average Reward Criterion

Observation-Based Optimization for POMDPs with Continuous State, Observation, and Action Spaces.

Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

Optimal Sample Complexity for Average Reward Markov Decision Processes

Potential Based Optimization Algorithm Of Constrained Markov Decision Processes

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting

On Solving Optimal Policies for Finite-Stage Event-Based Optimization

On Solving Event-Based Optimization with Average Reward over Infinite Stages

Observation-Based Performance Sensitivity Analysis for Pomdps

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

On Average Optimality for Non-Stationary Markov Decision Processes in Borel Spaces

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies

Simulation Optimization Algorithm for SMDPs with Parameterized Randomized Stationary Policies

Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

Average-Reward Reinforcement Learning with Trust Region Methods

What should be observed for optimal reward in POMDPs?

Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization