EASTER: Learning to Split Transformers at the Edge Robustly

Xiaotian Guo,Quan Jiang,Yixian Shen,Andy D. Pimentel,Todor Stefanov
DOI: https://doi.org/10.1109/tcad.2024.3438995
IF: 2.9
2024-11-09
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Prevalent large transformer models present significant computational challenges for resource-constrained devices at the Edge. While distributing the workload of deep learning models across multiple edge devices has been extensively studied, these works typically overlook the impact of failures of edge devices. Unpredictable failures, due to, e.g., connectivity issues or discharged batteries, can compromise the reliability of inference serving at the Edge. In this article, we introduce a novel methodology, called EASTER, designed to learn robust distribution strategies for transformer models against device failures that consider the tradeoff between robustness (i.e., maintaining model functionality against failures) and resource utilization (considering memory usage and computations). We evaluate EASTER with three representative transformers—ViT, GPT-2, and Vicuna—under device failures. Our results demonstrate EASTER's efficiency in memory usage, and possible end-to-end latency improvement for inference across multiple edge devices while preserving model accuracy as much as possible under device failures.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?