Practical Online Reinforcement Learning for Microprocessors With Micro-Armed Bandit

Gerasimos Gerogiannis,Josep Torrellas
DOI: https://doi.org/10.1109/mm.2024.3408719
IF: 2.8212
2024-08-29
IEEE Micro
Abstract:Although online reinforcement learning (RL) has shown promise for microarchitecture decision making, processor vendors are still reluctant to adopt it. There are two main reasons that make RL-based solutions unattractive. First, they have high complexity and storage overhead. Second, many RL agents are engineered for a specific problem and are not reusable. In this work, we propose a way to tackle these shortcomings. We find that, in diverse microarchitecture problems, only a few actions are useful in a given time window. Motivated by this property, we design Micro-Armed Bandit (or Bandit for short), an RL agent that is based on the low-complexity Multi-Armed Bandit algorithms. We show that Bandit can match or exceed the performance of more complex RL and non-RL alternatives in two different problems: data prefetching and instruction fetch thread selection in simultaneous multithreaded processors. We believe that Bandit's simplicity, reusability, and small storage overhead make online RL more practical for microarchitecture.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?