AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU

Yang Bai,Xufeng Yao,Qi Sun,Bei Yu
DOI: https://doi.org/10.1109/iccad51958.2021.9643487
2021-01-01
Abstract:Performance optimization is the art of continuously seeking an effective mapping between algorithm and hardware. Existing deep learning compilers or frameworks optimize the computation graph via adapting transformations manually designed by expert efforts. We argue that these methods ignore some possible graph-level optimizations, thus it is difficult to generalize to emerging deep learning models or new operators. In this work, we propose AutoGTCO, a tensor program generation system for vision tasks with the transformer architecture on GPU. Compared with existing fusion strategies, AutoGTCO explores the optimization of operator fusion in the transformer model through a novel dynamic programming algorithm. Specifically, to construct an effective search space of the sampled programs, new sketch generation rules and a search policy are proposed for the batch matrix multiplication and softmax operators in each subgraph, which are capable of fusing them into large computation units, it can then map and transform them into efficient CUDA kernels. Overall, our evaluation on three real-world transformer-based vision tasks shows that AutoGTCO improves the execution performance relative to deep learning engine TensorRT by up to 1.38 ×.
What problem does this paper attempt to address?