ETO: Accelerating Optimization of DNN Operators by High-Performance Tensor Program Reuse

Jingzhi Fang,Yanyan Shen,Yue Wang,Lei Chen
DOI: https://doi.org/10.14778/3489496.3489500
IF: 2.5
2021-01-01
Proceedings of the VLDB Endowment
Abstract:Recently, deep neural networks (DNNs) have achieved great success in various applications, where low inference latency is important. Existing solutions either manually tune the kernel library or utilize search-based compilation to reduce the operator latency. However, manual tuning requires significant engineering effort, and the huge search space makes the search cost of the search-based compilation unaffordable in some situations. In this work, we propose ETO, a framework for speeding up DNN operator optimization based on reusing the information of performant tensor programs. Specifically, ETO defines conditions for the information reuse between two operators. For operators satisfying the conditions, based on the performant tensor program information of one operator, ETO uses a reuse-based tuner to significantly prune the search space of the other one, and keeps optimization effectiveness at the same time. In this way, for a set of operators, ETO first determines the information reuse relationships among them to reduce the total search time needed, and then tunes the operators either by the backend compiler or by the reuse-based tuner accordingly. ETO further increases the reuse opportunities among the operators by injecting extra operators as bridges between two operators which do not satisfy the reuse conditions. Compared with various existing methods, the experiments show that ETO is effective and efficient in optimizing DNN operators.
What problem does this paper attempt to address?