Switchboard: An Open-Source Framework for Modular Simulation of Large Hardware Systems

Steven Herbst,Noah Moroze,Edgar Iglesias,Andreas Olofsson
2024-07-30
Abstract:Scaling up hardware systems has become an important tactic for improving performance as Moore's law fades. Unfortunately, simulations of large hardware systems are often a design bottleneck due to slow throughput and long build times. In this article, we propose a solution targeting designs composed of modular blocks connected by latency-insensitive interfaces. Our approach is to construct the hardware simulation in a similar fashion as the design itself, using a prebuilt simulator for each block and connecting the simulators via fast shared-memory queues at runtime. This improves build time, because simulation scale-up simply involves running more instances of the prebuilt simulators. It also addresses simulation speed, because prebuilt simulators can run in parallel, without fine-grained synchronization or global barriers. We introduce a framework, Switchboard, that implements our approach, and discuss two applications, demonstrating its speed, scalability, and accuracy: (1) a web application where users can run fast simulations of chiplets on an interposer, and (2) a wafer-scale simulation of one million RISC-V cores distributed across thousands of cloud compute cores.
Distributed, Parallel, and Cluster Computing,Hardware Architecture
What problem does this paper attempt to address?
The paper attempts to address the bottlenecks encountered in large-scale hardware system simulations, including long simulation build times and slow execution speeds. As the effects of Moore's Law diminish, improving performance by scaling up hardware systems becomes increasingly important, but this scaling also brings challenges in simulation. Traditional parallel simulation methods, while improving simulation speed to some extent, still have limitations when dealing with very large hardware systems, such as slow build processes, the need for precise load balancing, and typically limited core parallelism achievable on a single host system. The paper proposes a new solution for hardware systems composed of modular components connected through latency-insensitive interfaces. This approach improves build time and simulation speed by pre-building simulators for each module and connecting these simulators at runtime through fast shared memory queues. The paper introduces a framework called Switchboard, which implements this method and demonstrates its speed, scalability, and accuracy through two application examples: a web application that allows users to interactively simulate chips on a silicon interposer, and a wafer-scale simulation of one million RISC-V cores using standard cloud computing resources.