Abstract:Reinforcement learning (RL) has excelled in sequential decision-making and control tasks, yet traditional RL algorithms are limited by adherence to a single control style in identical scenarios, failing to address varied control preferences. Existing multi-style RL methods typically require customized reward or objective functions tailored to specific control styles, which may not be feasible when diverse driving styles are necessary. To overcome these limitations, we propose the multi-style distributional soft actor-critic (M-DSAC) algorithm, capable of learning a single policy that supports multiple control behaviors. We begin by developing a multi-style policy iteration (MPI) framework that learns the entire distribution of returns, known as the value distribution, rather than just focusing on the expected return (i.e., the $Q$ value). In this framework, we utilize the quantile index of the value distribution as a style indicator, enhancing the inputs of both the policy and its corresponding value distribution with these quantile indices. Building upon the MPI framework, the M-DSAC algorithm employs a parameterized diagonal Gaussian function to approximate the value distribution. This approach enables efficient computation of different value quantiles by combining the value distribution's mean and standard deviations with appropriate coefficients. By optimizing the policy across different quantiles, M-DSAC efficiently learns a versatile policy that can handle a range of control styles without the burden of significant computing costs. Experimental evaluations using MuJoCo benchmarks and real-world robot control tasks confirm the effectiveness of M-DSAC, showcasing its broad practical applicability.

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Gaussian Process Policy Optimization

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

OPAC: Opportunistic Actor-Critic

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Actor-Critic Reinforcement Learning with Phased Actor

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization