Abstract:In this paper, we address the challenges faced by Value Iteration Networks (VIN) in handling larger input maps and mitigating the impact of accumulated errors caused by increased iterations. We propose a novel approach, Value Iteration Networks with Gated Summarization Module (GS-VIN), which incorporates two main improvements: (1) employing an Adaptive Iteration Strategy in the Value Iteration module to reduce the number of iterations, and (2) introducing a Gated Summarization module to summarize the iterative process. The adaptive iteration strategy uses larger convolution kernels with fewer iteration times, reducing network depth and increasing training stability while maintaining the accuracy of the planning process. The gated summarization module enables the network to emphasize the entire planning process, rather than solely relying on the final global planning outcome, by temporally and spatially resampling the entire planning process within the VI module. We conduct experiments on 2D grid world path-finding problems and the Atari Mr. Pac-man environment, demonstrating that GS-VIN outperforms the baseline in terms of single-step accuracy, planning success rate, and overall performance across different map sizes. Additionally, we provide an analysis of the relationship between input size, kernel size, and the number of iterations in VI-based models, which is applicable to a majority of VI-based models and offers valuable insights for researchers and industrial deployment.

Sketched Newton Value Iteration for Large-Scale Markov Decision Processes

Deflated Dynamics Value Iteration

An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and Vector-Valued Action Space

Universal Value Iteration Networks: when Spatially-Invariant is Not Universal.

A Neighborhood-Based Value Iteration Algorithm For Pomdp Problems

Value Iteration Networks with Gated Summarization Module

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

Asynchronous value iteration for markov decision processes with continuous state spaces

A Probabilistic Greedy Search Value Iteration Algorithm For Pomdp

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Incremental Value Iteration for Time-Aggregated Markov-Decision Processes

A Multi-Criteria Value Iteration Algorithm For Pomdp Problems

Generalized Value Iteration Networks: Life Beyond Lattices

A Probabilistic Forward Search Value Iteration Algorithm for POMDP

Popvi: A Probability-Based Optimal Policy Value Iteration Algorithm

nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

A Probability-Based Value Iteration on Optimal Policy Algorithm for POMDP

Explicit Planning for Efficient Exploration in Reinforcement Learning.

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning