Abstract:We develop a novel multi-objective reinforcement learning (MORL) framework to jointly optimize wireless network selection and autonomous driving policies in a multi-band vehicular network operating on conventional sub-6GHz spectrum and Terahertz frequencies. The proposed framework is designed to 1. maximize the traffic flow and 2. minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration), and enhance the ultra-reliable low-latency communication (URLLC) while minimizing handoffs (HOs). We cast this problem as a multi-objective Markov Decision Process (MOMDP) and develop solutions for both predefined and unknown preferences of the conflicting objectives. Specifically, deep-Q-network and double deep-Q-network-based solutions are developed first that consider scalarizing the transportation and telecommunication rewards using predefined preferences. We then develop a novel envelope MORL solution which develop policies that address multiple objectives with unknown preferences to the agent. While this approach reduces reliance on scalar rewards, policy effectiveness varying with different preferences is a challenge. To address this, we apply a generalized version of the Bellman equation and optimize the convex envelope of multi-objective Q values to learn a unified parametric representation capable of generating optimal policies across all possible preference configurations. Following an initial learning phase, our agent can execute optimal policies under any specified preference or infer preferences from minimal data samples.Numerical results validate the efficacy of the envelope-based MORL solution and demonstrate interesting insights related to the inter-dependency of vehicle motion dynamics, HOs, and the communication data rate. The proposed policies enable autonomous vehicles to adopt safe driving behaviors with improved connectivity.

An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update

A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

Combining a Gradient-Based Method and an Evolution Strategy for Multi-Objective Reinforcement Learning.

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Continual Multi-Objective Reinforcement Learning Via Reward Model Rehearsal

Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

Demonstration Guided Multi-Objective Reinforcement Learning

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

A reinforcement learning approach for dynamic multi-objective optimization

C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

A Novel Multi-Step Q-learning Method to Improve Data Efficiency for Deep Reinforcement Learning.

Robust Multiobjective Reinforcement Learning Considering Environmental Uncertainties

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks

Improved Robot Path Planning Method Based on Deep Reinforcement Learning

Multi-strategy Multi-Objective Differential Evolutionary Algorithm with Reinforcement Learning

Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain

Optimistic sequential multi-agent reinforcement learning with motivational communication