Abstract:The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information about their precise choices. In some domains like personalized retrieval, advertising and healthcare, the expert choices are considered sensitive data. To provably protect the privacy of such experts, we propose a novel consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm. We prove rigorous differential privacy guarantees, while maintaining strong empirical performance. Unlike existing work in differentially private RL, we supplement the theory with proof-of-concept experiments on classic RL environments featuring large continuous state spaces, demonstrating substantial improvements over a natural baseline across multiple tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to protect expert - level privacy in offline reinforcement learning (Offline Reinforcement Learning, Offline RL). Specifically, researchers hope to learn the optimal policy from historical data, and these data are collected by one or more behavioral policies (i.e., experts) through interacting with the environment. However, since the behavior choices of these experts may contain sensitive information, in some fields (such as personalized retrieval, advertising, and healthcare), directly using these data may disclose the private information of experts. To solve this problem, the author proposes a novel consensus - based method, which can provide strict expert - level differential privacy (Differential Privacy, DP) guarantees without sacrificing learning performance. This method is applicable to any existing offline RL algorithm and has been experimentally verified in the classical RL environment, proving its effectiveness and superiority. ### Key Issues 1. **Protecting Expert - Level Privacy**: In offline RL, when learning from the historical data of multiple experts, how to ensure that the learned policy will not disclose the specific selection information of a single expert. 2. **Applicability and Performance**: The proposed algorithm needs to be able to maintain strong performance in various tasks while providing strict privacy protection. ### Solution Overview To achieve this goal, the author makes the following contributions: - **Formalizing the Expert - Level Privacy Problem**: Clearly proposes the expert - level privacy problem in offline RL and illustrates its importance through practical examples. - **Designing an Algorithm with Strong Privacy Guarantees**: Develops a practical version of the gradient - based offline RL algorithm to ensure expert - level privacy. Unlike existing methods, the new method not only adds noise to the data but also identifies a subset of data that can be used without noise. - **Empirical Evaluation**: Conducts proof - of - concept experiments on standard RL benchmarks, showing significant improvements of the proposed method over baseline methods (such as DP - SGD). ### Technical Details - **Two - stage Algorithm**: - **Data Filtering Stage**: Generates a set of trajectory prefixes in a privacy - protected manner, which can be directly used in training without additional noise. - **DP - SGD Training Stage**: For the remaining part of the trajectory, uses differential privacy stochastic gradient descent (DP - SGD) for training. - **Privacy Analysis**: - By introducing the sparse vector technique, ensures that the privacy budget will not be violated when selecting stable trajectory prefixes. - Provides strict mathematical proofs to ensure that the entire algorithm meets the required privacy protection requirements. In summary, this paper aims to solve the problem of expert - level privacy protection in offline RL, proposes an efficient and safe method, and provides new ideas and technical support for applications in related fields.

Preserving Expert-Level Privacy in Offline Reinforcement Learning

Offline Reinforcement Learning with Differential Privacy

How Private Is Your RL Policy? An Inverse RL Based Analysis Framework

Differentially Private Deep Model-Based Reinforcement Learning

Privacy Preserving Off-Policy Evaluation

Adaptive Control of Differentially Private Linear Quadratic Systems

Differentially Private Reinforcement Learning with Linear Function Approximation

Differentially Private Reinforcement Learning with Self-Play

Efficient Online Reinforcement Learning with Offline Data

Preserving the Privacy of Reward Functions in MDPs through Deception

Privacy Preserving Reinforcement Learning for Population Processes

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning

No-regret Exploration in Shuffle Private Reinforcement Learning

Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources

Selective Uncertainty Propagation in Offline RL

Efficient Offline Reinforcement Learning With Relaxed Conservatism

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Instabilities of Offline RL with Pre-Trained Neural Representation