Highway Graph to Accelerate Reinforcement Learning

Zidu Yin,Zhen Zhang,Dong Gong,Stefano V. Albrecht,Javen Q. Shi

2024-05-20

Abstract:Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, a non-branching sequence of transitions moves the agent without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.

Machine Learning

What problem does this paper attempt to address?

This paper proposes a solution to the problem of low efficiency in Reinforcement Learning (RL) training. Traditional RL algorithms update values step by step on the state transition graph, which is computationally expensive and time-consuming for action spaces with a large number of states. The paper introduces a new graph structure called "Highway Map" to accelerate the value learning process. In deterministic environments and discrete state-action spaces, non-branching transition paths (highways) can directly move the agent from one state to another without passing through intermediate states, thereby consolidating the value update steps. The Highway Map compresses the transition model, representing multiple state transitions with edges, and supports value propagation across multiple time steps in each iteration, improving the efficiency of value learning. The main contributions of the paper include: 1. Introducing the Highway Map for more efficient value learning, allowing for long-range value propagation. 2. Improving classical value iteration algorithms based on the Highway Map and theoretically proving their convergence on the Highway Map. 3. Re-parameterizing the Highway Map as a neural network agent, improving generalization ability and reducing storage costs. 4. Experimental results show that compared to various model-based and model-free RL algorithms, the Highway Map method significantly reduces training time in different tasks while maintaining or improving expected return. The paper demonstrates through experiments that the Highway Map can improve the efficiency of RL training, especially in the early stages. Compared to baseline methods, the training speed is increased by 10 to over 150 times. In addition, neural network agents trained using the Highway Map also have advantages in generalization and storage costs.

Highway Graph to Accelerate Reinforcement Learning

Learning an Efficient and Safe Policy for Highway Driving Using Supervised Learning and Reinforcement Learning.

Highway Reinforcement Learning

Leveraging the Capabilities of Connected and Autonomous Vehicles and Multi-Agent Reinforcement Learning to Mitigate Highway Bottleneck Congestion

Freeway network traffic management based on distributed reinforcement learning

A Machine Learning Method for Dynamic Traffic Control and Guidance on Freeway Networks

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Explore with Dynamic Map: Graph Structured Reinforcement Learning

Multi-Reward Architecture Based Reinforcement Learning for Highway Driving Policies

GRL-PS: Graph Embedding-Based DRL Approach for Adaptive Path Selection

Graph Convolution Reinforcement Learning for Decision-Making in Highway Overtaking Scenario

Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments

Graph-based multi agent reinforcement learning for on-ramp merging in mixed traffic

Graph learning-based generation of abstractions for reinforcement learning

A Reinforcement Learning Approach to Autonomous Decision Making of Intelligent Vehicles on Highways

TITE: A Transformer-Based Deep Reinforcement Learning Approach for Traffic Engineering in Hybrid SDN with Dynamic Traffic

Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Exploring DQN-Based Reinforcement Learning in Autonomous Highway Navigation Performance Under High-Traffic Conditions

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

An efficient planning method based on deep reinforcement learning with hybrid actions for autonomous driving on highway

End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning