Graph neural networks with configuration cross-attention for tensor compilers

Dmitrii Khizbullin,Eduardo Rocha de Andrade,Thanh Hau Nguyen,Matheus Pedroza Ferreira,David R. Pugh
2024-11-25
Abstract:With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
Machine Learning,Hardware Architecture,Performance
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to optimize the execution efficiency of neural network inference workloads. Specifically, the author proposes a new architecture based on Graph Neural Networks (GNN) - TGraph, which aims to screen out fast configuration schemes for the target computational graph, thereby achieving an efficient tensor compiler. ### Main Problem Description With the popularity of neural networks, it has become crucial to efficiently provide inference services. The inference workload of a neural network can be represented as a computational graph, where nodes are operators for transforming multi - dimensional tensors. These tensors can be transposed and/or sliced in multiple ways, and certain configurations can significantly accelerate the inference process. However, traditional methods mainly rely on heuristic rules to select the best configuration. Although this method is fast, it cannot achieve the absolute optimal running time. ### Solution The author proposes TGraph, a Graph Neural Network architecture with a configuration cross - attention mechanism. By learning the impact of different configurations on the performance of the computational graph, TGraph can more accurately predict the best configuration, thereby achieving better performance than traditional methods. Specifically, TGraph has made improvements in the following aspects: 1. **Introduced the cross - channel self - attention mechanism**: This helps to capture the correlations between different channels and enhance important features. 2. **Introduced the cross - configuration attention mechanism**: This enables the model to explicitly compare different configurations in the entire batch, improving the performance of the ranking task. 3. **Data pre - processing and optimization**: Including techniques such as node pruning, configuration deduplication, and compression, which reduce memory usage and accelerate the training speed. ### Experimental Results Experiments show that TGraph has achieved results significantly better than existing methods on multiple benchmark datasets. In particular, its performance on the TpuGraphs dataset is particularly outstanding, with the average Kendall’s τ increasing from 29.8% in the baseline to 67.4%, and the estimated potential CO₂ emission reduction is equivalent to more than 50% of household emissions. ### Social Impact This research also emphasizes the importance of optimizing AI workloads in data centers. According to estimates, by improving the execution efficiency of AI workloads, energy consumption and carbon dioxide emissions can be significantly reduced, thereby combating climate change. In conclusion, this paper successfully solves the key problems in the optimization of neural network inference workloads by proposing the innovative Graph Neural Network architecture TGraph, and shows its great potential in practical applications.