Abstract:On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control presents significant challenges due to limited data availability and inefficient online training processes. This paper introduces DistRL, a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in the context of dynamic online interactions. Additionally, the framework is backed by our tailor-made RL algorithm, which effectively balances exploration with the prioritized utilization of collected data to ensure stable and robust training. Our experiments show that, on average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate compared to state-of-the-art methods on general Android tasks from an open benchmark, significantly outperforming existing approaches while maintaining the same training time. These results validate DistRL as a scalable and efficient solution, offering substantial improvements in both training efficiency and agent performance for real-world, in-the-wild device control tasks.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to achieve efficient and reliable online reinforcement learning (RL) fine - tuning on mobile devices, in order to improve the performance of multimodal large language models (MLLMs) in device control tasks. Specifically, the paper addresses the following key challenges: 1. **Limited data availability and inefficient online training process**: - When performing control tasks on mobile devices, due to limited data acquisition and inefficient online training, it is difficult to effectively fine - tune MLLMs. - Existing offline datasets cannot capture the dynamic changes of mobile applications and environments, resulting in poor performance of models trained on these datasets in actual deployment. 2. **Complexity of distributed asynchronous data collection and training**: - Asynchronous data collection introduces algorithmic difficulties, such as non - stationary data distributions that hinder convergence, and the delay between policy updates and data collection may lead to performance degradation. - In a distributed environment, the data collection rates and times of different devices are not synchronized, increasing the difficulty of maintaining consistency and stability. 3. **Limitations of existing methods**: - Previous work relied on complex wrappers or static data training and could not adapt to the ever - changing real - world environment. - Even the most advanced multimodal large language models (such as GPT - 4V) have limitations when handling GUI control tasks, especially in error recovery and behavioral rationality. To solve these problems, the paper proposes the DistRL framework, a new distributed reinforcement learning fine - tuning framework, which aims to improve the performance of mobile device control agents in the following ways: - **Asynchronous distributed architecture**: Adopt a centralized training and decentralized data collection approach to ensure efficient online fine - tuning. - **Custom - designed RL algorithm**: Design a new off - policy reinforcement learning algorithm A - RIDE, which can effectively balance exploration and exploitation and give priority to using valuable empirical data to ensure stable and efficient training. - **Distributed prioritized experience replay (DPER)**: By prioritizing important trajectories in the replay buffer, improve sample utilization and accelerate convergence. Experimental results show that compared with existing synchronous multi - machine methods, DistRL improves training efficiency by 3 times, data collection speed by 2.4 times, and relatively increases the success rate of general Android tasks by 20%. This verifies the scalability and efficiency of DistRL in real - world device control tasks. In summary, by proposing the DistRL framework, this paper solves the key challenges of online reinforcement learning fine - tuning in mobile device control tasks and significantly improves training efficiency and agent performance.

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Robot Simulation and Reinforcement Learning Training Platform Based on Distributed Architecture.

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

An Offline-Transfer-Online Framework for Cloud-Edge Collaborative Distributed Reinforcement Learning

Deploying Offline Reinforcement Learning with Human Feedback

RLIF: Interactive Imitation Learning as Reinforcement Learning

DMADRL: A Distributed Multi-agent Deep Reinforcement Learning Algorithm for Cognitive Offloading in Dynamic MEC Networks

Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics

Task and Domain Adaptive Reinforcement Learning for Robot Control

FLIRRAS: Fast Learning With Integrated Reward and Reduced Action Space for Online Multitask Offloading

Fractional Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

A Digital Twin Framework for Reinforcement Learning with Real-Time Self-Improvement via Human Assistive Teleoperation

Group-Agent Reinforcement Learning