Exploring In-Memory Accelerators and FPGAs for Latency-Sensitive DNN Inference on Edge Servers

Ali Suvizi,Suresh Subramaniam,Tian Lan,Guru Venkataramani
DOI: https://doi.org/10.1109/cloud-summit61220.2024.00007
2024-01-01
Abstract:Edge servers are frequently used in latency-sensitive environments, like search-and-rescue missions involving unmanned aerial vehicles (UAV) that have real-time processing needs. In this paper, we study system designs that address the challenges of reducing latency and optimizing the power usage using Processing-in-Memory (PIM) and Field-Programmable Gate Array (FPGA). Age of Information (AoI) is a key metric to measure the data freshness for processing the images captured in real time. Our experimental results show our architecture significantly boosts computational speed and energy efficiency. Through the integration of PIM and FPGA into our edge server, latency is significantly reduced, achieving a speed-up of 92x for PIM, and further to just 0.02 ms for FPGA, a sharp decrease from 43.48 ms on CPUs. Power consumption for inference tasks on LeNet-5 model is 0.36W with PIM, down from 11.57W on a CPU, and to 5.22W with FPGA. These results show the effectiveness of in-memory accelerators and FPGAs in ensuring that information remains current and actionable. In addition, our system's capability to support UAVs notably improves the real-time IoT application scalability. Specifically, our accelerator enhanced edge server can manage 1.6x more UAVs for VGG-8 model and up to 71x more for LeNet-5 inference tasks, compared to CPU-only, demonstrating its robustness in edge computing.
What problem does this paper attempt to address?