ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs

Bei Yu,Zixiao Wang,Wenqian Zhao,Shuo Yin,Yang Bai
DOI: https://doi.org/10.18653/v1/2023.emnlp-main.250
Abstract:The training and inference efficiency of ever-larger deep neural networks highly rely on the performance of tensor operators on specific hardware platforms. Therefore, a compilation-based optimization flow with automatic tensor generation and parameter tuning is necessary for efficient model deployment. While compilation-based methods with performance models can provide dynamic and suitable code optimization, they suffer from a large design space exploration with rough measurement accuracy and poor transferability among different hardware platforms. This paper presents ATFormer, a simple yet efficient design with attention-inspired modules to accurately predict the performance of optimized operators by capturing global and long-range dependencies within a complete scheduling space. Compared with state-of-the-arts, ATFormer can predict the optimal implementation of tensor operators to reduce inference time with minimal effort on modern DNN benchmarks. Furthermore, AT-Former with pre-trained parameters can quickly adapt to different workloads and hardware via transfer learning.
Computer Science,Engineering
What problem does this paper attempt to address?