Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices

Weihong Liu,Zongwei Zhu,Boyu Li,Yi Xiong,Zirui Lian,Jiawei Geng,Xuehai Zhou
DOI: https://doi.org/10.1109/tcad.2024.3443706
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:The surge in intelligent edge computing has propelled the adoption and expansion of the distributed embedded systems (DESs). Numerous scheduling strategies are introduced to improve the DES throughput, such as latency-aware and group-based hierarchical scheduling. Effective device modeling can help in modular and plug-in scheduler design. For uniformity in scheduling interfaces, an unified device performance modeling is adopted, typically involving the system-level modeling that incorporates both the hardware and software stacks, broadly divided into two categories. Fine-grained modeling methods based on the hardware architecture analysis become very difficult when dealing with a large number of heterogeneous devices, mainly because much architecture information is closed-source and costly to analyse. Coarse-grained methods are based on the limited architecture information or benchmark models, resulting in insufficient generalization in the complex inference performance of diverse deep neural networks (DNNs). Therefore, we introduce a two-stage system-level modeling method (Arch2End), combining limited architecture information with scalable benchmark models to achieve an unified performance representation. Stage one leverages public information to analyse architectures in an uniform abstraction and to design the benchmark models for exploring the device performance boundaries, ensuring uniformity. Stage two extracts critical device features from the end-to-end inference metrics of extensive simulation models, ensuring universality and enhancing characterization capacity. Compared to the state-of-the-art methods, Arch2End achieves the lowest DNN latency prediction relative errors in the NAS-Bench-201 (1.7%) and real-world DNNs (8.2%). It also showcases superior performance in intergroup balanced device grouping strategies.
What problem does this paper attempt to address?