Abstract:The commissioning and operation of future large-scale scientific experiments will challenge current tuning and control methods. Reinforcement learning (RL) algorithms are a promising solution thanks to their capability of autonomously tackling a control problem based on a task parameterized by a reward function. The conventionally utilized machine learning (ML) libraries are not intended for microsecond latency applications, as they mostly optimize for throughput performance. On the other hand, most of the programmable logic implementations are meant for computation acceleration, not being intended to work in a real-time environment. To overcome these limitations of current implementations, RL needs to be deployed on-the-edge, i.e. on to the device gathering the training data. In this paper we present the design and deployment of an experience accumulator system in a particle accelerator. In this system deep-RL algorithms run using hardware acceleration and act within a few microseconds, enabling the use of RL for control of ultra-fast phenomena. The training is performed offline to reduce the number of operations carried out on the acceleration hardware. The proposed architecture was tested in real experimental conditions at the Karlsruhe research accelerator (KARA), serving also as a synchrotron light source, where the system was used to control induced horizontal betatron oscillations in real-time. The results showed a performance comparable to the commercial feedback system available at the accelerator, proving the viability and potential of this approach. Due to the self-learning and reconfiguration capability of this implementation, its seamless application to other control problems is possible. Applications range from particle accelerators to large-scale research and industrial facilities.

Practical Online Reinforcement Learning for Microprocessors With Micro-Armed Bandit

Efficient Reinforcement Learning On Passive RRAM Crossbar Array

MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

The Bandit Whisperer: Communication Learning for Restless Bandits

Reinforcement Learning Agent Design and Optimization with Bandwidth Allocation Model

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints

RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

Efficient Online Reinforcement Learning with Offline Data

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

Microsecond-Latency Feedback at a Particle Accelerator by Online Reinforcement Learning on Hardware

Combinatorial Rising Bandit

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Cascading Reinforcement Learning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Reinforcement actor-critic learning as a rehearsal in MicroRTS

Selective Reviews of Bandit Problems in AI via a Statistical View