Cheshire: A Lightweight, Linux-Capable RISC-V Host Platform for Domain-Specific Accelerator Plug-In

Alessandro Ottaviano,Thomas Benz,Paul Scheffler,Luca Benini
2023-07-06
Abstract:Power and cost constraints in the internet-of-things (IoT) extreme-edge and TinyML domains, coupled with increasing performance requirements, motivate a trend toward heterogeneous architectures. These designs use energy-efficient application-class host processors to coordinate compute-specialized multicore accelerators, amortizing the architectural costs of operating system support and external communication. This brief presents Cheshire, a lightweight and modular 64-bit Linux-capable host platform designed for the seamless plug-in of domain-specific accelerators. It features a unique low-pin-count DRAM interface, a last-level cache configurable as scratchpad memory, and a DMA engine enabling efficient data movement to or from accelerators or DRAM. It also provides numerous optional IO peripherals including UART, SPI, I2C, VGA, and GPIOs. Cheshire's synthesizable RTL description, comprising all of its peripherals and its fully digital DRAM interface, is available free and open-source. We implemented and fabricated Cheshire as a silicon demonstrator called Neo in TSMC's 65nm CMOS technology. At 1.2 V, Neo achieves clock frequencies of up to 325 MHz while not exceeding 300 mW in total power on data-intensive computational workloads. Its RPC DRAM interface consumes only 250 pJ/B and incurs only 3.5 kGE in area for its PHY while attaining a peak transfer rate of 750 MB/s at 200 MHz.
Hardware Architecture
What problem does this paper attempt to address?
The paper aims to address the demand for high-performance, low-power computing platforms in the extreme edge of the Internet of Things (IoT) and TinyML domains. Specifically, this paper proposes a lightweight, modular 64-bit RISC-V host platform named Cheshire, which can seamlessly integrate domain-specific accelerators (DSAs). The main features of Cheshire include: 1. **Unique low-pin-count DRAM interface**: This interface can significantly reduce the cost and complexity of hardware integration. 2. **Last-level cache configurable as static memory**: This design allows the host to have fast internal static random-access memory (SRAM) when needed. 3. **Direct Memory Access (DMA) engine**: This enables efficient data transfer between the host and accelerators or DRAM. 4. **Multiple optional peripherals**: Including UART, SPI, I2C, GPIO modules, and VGA output. Additionally, Cheshire's design is highly flexible and can be configured according to different needs, and its RTL description and FPGA implementation are open-source. The paper also introduces a silicon demonstration version of Cheshire called Neo, which achieved a clock frequency of up to 325MHz under TSMC's 65nm process, while the total power consumption under data-intensive workloads did not exceed 300mW. Its DRAM interface consumes only 250 pJ of energy per byte transferred, achieving a peak transfer rate of 750 MB/s at 200MHz. These features make Cheshire an ideal choice for high-performance, low-power embedded systems.