Abstract:Reinforcement learning (RL) has excelled in sequential decision-making and control tasks, yet traditional RL algorithms are limited by adherence to a single control style in identical scenarios, failing to address varied control preferences. Existing multi-style RL methods typically require customized reward or objective functions tailored to specific control styles, which may not be feasible when diverse driving styles are necessary. To overcome these limitations, we propose the multi-style distributional soft actor-critic (M-DSAC) algorithm, capable of learning a single policy that supports multiple control behaviors. We begin by developing a multi-style policy iteration (MPI) framework that learns the entire distribution of returns, known as the value distribution, rather than just focusing on the expected return (i.e., the $Q$ value). In this framework, we utilize the quantile index of the value distribution as a style indicator, enhancing the inputs of both the policy and its corresponding value distribution with these quantile indices. Building upon the MPI framework, the M-DSAC algorithm employs a parameterized diagonal Gaussian function to approximate the value distribution. This approach enables efficient computation of different value quantiles by combining the value distribution's mean and standard deviations with appropriate coefficients. By optimizing the policy across different quantiles, M-DSAC efficiently learns a versatile policy that can handle a range of control styles without the burden of significant computing costs. Experimental evaluations using MuJoCo benchmarks and real-world robot control tasks confirm the effectiveness of M-DSAC, showcasing its broad practical applicability.

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

Distributional Soft Actor Critic for Risk Sensitive Learning

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Risk Sensitive Distributional Soft Actor Critic for Portfolio Management

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

Revisiting Discrete Soft Actor-Critic

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Distributional Reinforcement Learning for Efficient Exploration

Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors

Safe Distributional Reinforcement Learning

Improving Robustness via Risk Averse Distributional Reinforcement Learning

Density estimation based soft actor-critic: deep reinforcement learning for static output feedback control with measurement noise

Generalizing soft actor-critic algorithms to discrete action spaces

Distributional Method for Risk Averse Reinforcement Learning

CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics