Abstract:Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process. This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods - optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs. The monograph provides regret guarantees and analyzes constraint violation for each of the discussed setups. For the above exploration, we assume the underlying MDP to be ergodic. Further, this monograph extends its discussion to encompass results tailored for weakly communicating MDPs, thereby broadening the scope of its findings and their relevance to a wider range of practical scenarios.

Average Optimality for Unbounded Rewards

Average Optimality for Pathwise Rewards

Average Optimality for Finite Models

Discount Optimality for Unbounded Rewards

Average Optimality in Markov Decision Processes with Unbounded Rewards

Finding Optimal Observation-Based Policies for Constrained POMDPs under the Expected Average Reward Criterion

Finding Optimal Memoryless Policies of POMDPs under the Expected Average Reward Criterion

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Nonhomogeneous Markov Decision Processes with Borel State Space-The Average Criterion with Nonuniformly Bounded Rewards.

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Average Optimality for Nonnegative Costs

On Average Optimality for Non-Stationary Markov Decision Processes in Borel Spaces

Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards

Beyond Average Return in Markov Decision Processes

Optimal Sample Complexity for Average Reward Markov Decision Processes

Average-Reward Reinforcement Learning with Trust Region Methods

New Average Optimality Conditions for Semi-Markov Decision Processes in Borel Spaces.

Risk-averse Total-reward MDPs with ERM and EVaR

Constrained Optimality for Average Criteria

Average Optimality for Markov Decision Processes in Borel Spaces: a New Condition and Approach

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption