Abstract:Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the so-called "rewiring" step in these algorithms can be interpreted as a local policy iteration (PI) step (i.e., a local policy evaluation step followed by a local policy improvement step) so that asymptotically, as the number of samples tend to infinity, both algorithms converge to the optimal path almost surely (with probability 1). Policy iteration, along with value iteration (VI) are common methods for solving dynamic programming (DP) problems. Based on this observation, recently, the RRT$^{\#}$ algorithm has been proposed, which performs, during each iteration, Bellman updates (aka "backups") on those vertices of the graph that have the potential of being part of the optimal path (i.e., the "promising" vertices). The RRT$^{\#}$ algorithm thus utilizes dynamic programming ideas and implements them incrementally on randomly generated graphs to obtain high quality solutions. In this work, and based on this key insight, we explore a different class of dynamic programming algorithms for solving shortest-path problems on random graphs generated by iterative sampling methods. These class of algorithms utilize policy iteration instead of value iteration, and thus are better suited for massive parallelization. Contrary to the RRT* algorithm, the policy improvement during the rewiring step is not performed only locally but rather on a set of vertices that are classified as "promising" during the current iteration. This tends to speed-up the whole process. The resulting algorithm, aptly named Policy Iteration-RRT$^{\#}$ (PI-RRT$^{\#}$) is the first of a new class of DP-inspired algorithms for randomized motion planning that utilize PI methods.

Lifted-Rollout for Approximate Policy Iteration of Markov Decision Process

Policy Optimization with Stochastic Mirror Descent.

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Approximate Policy Iteration Schemes: A Comparison

Representation Policy Iteration

Inexact Policy Iteration Methods for Large-Scale Markov Decision Processes

A rollout method for finite-stage event-based decision processes

From Optimization to Control: Quasi Policy Iteration

Approximate Modified Policy Iteration

Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme

Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

Truncated Variance Reduced Value Iteration

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

A Policy Gradient Method with Variance Reduction for Uplift Modeling.

Policy iteration for parameterized Markov decision processes and its application

Online Markov decision processes with policy iteration

Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering

Corrected: On Confident Policy Evaluation for Factored Markov Decision Processes with Node Dropouts

RL-Driven MPPI: Accelerating Online Control Laws Calculation with Offline Policy

A Rollout Algorithm For Multichain Markov Decision Processes With Average Cost