Abstract:Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: <a class="link-external link-https" href="https://lei-kun.github.io/uni-o4/" rel="external noopener nofollow">this https URL</a> .

ANOTO: Improving Automated Negotiation Via Offline-to-Online Reinforcement Learning.

An Effective Negotiating Agent Framework Based on Deep Offline Reinforcement Learning.

Transfer Reinforcement Learning Based Negotiating Agent Framework.

An Agent Bilateral Multi-issue Alternate Bidding Negotiation Protocol Based on Reinforcement Learning and Its Application in E-commerce.

Transfer Learning based Agent for Automated Negotiation

Detecting and Learning Against Unknown Opponents for Automated Negotiations.

A Deep Reinforcement Learning-Based Agent for Negotiation with Multiple Communication Channels

Automated Configuration of Negotiation Strategies

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Robustness Analysis of Negotiation Strategies through Multiagent Learning in Repeated Negotiation Games.

Toward Efficient Agreements In Real-Time Multilateral Agent-Based Negotiations

Negotiating with Unknown Opponents Toward Multi-lateral Agreement in Real-Time Domains

Strategy and Algorithm of Multi-Agent Negotiation Based on Q-reinforcement Learning

An Autonomous Negotiating Agent Framework with Reinforcement Learning Based Strategies and Adaptive Strategy Switching Mechanism

Optimized negotiation strategy based on reinforcement learning

Reinforcement Learning Negotiation Strategy Based on Opponent Classification

An Adaptive Agent Architecture For Automated Negotiation

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

An autonomous agent for negotiation with multiple communication channels using parametrized deep Q-network *

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

Towards General Negotiation Strategies with End-to-End Reinforcement Learning