PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

Neil He,Ming-Cheng Cheng,Yu Liu
2024-12-09
Abstract:The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a $\textbf{Py}$Torch-based $\textbf{POD-GP}$ library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over $23.4\times$ speedup in training and over $10\times$ speedup in inference on a GPU with over 13,000 cores, with just $1.2\%$ error over the device layer.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the full - chip dynamic thermal simulation problem of multi - core GPUs in high - performance computing (HPC)**. Specifically, with the increase in high - performance computing requirements, efficient dynamic thermal simulation tools are required to optimize performance and extend device life. Although traditional direct numerical simulation (DNS) has high precision, its computational cost is huge, and while other alternative methods improve efficiency, they sacrifice precision or resolution. ### Specific description of the problem: 1. **Thermal management challenges brought by high - density processors**: - Modern chip designs significantly increase the power density of processors, resulting in high - temperature gradients and hot spots, thereby reducing performance and reliability. - Although the implementation of dynamic thermal management systems can alleviate these problems, efficient and high - precision thermal simulation tools are still needed to support them. 2. **Limitations of existing methods**: - **Direct numerical simulation (DNS)**: Although it provides an accurate temperature solution, its computational cost is extremely high due to its high degrees of freedom (DoF). - **Other alternative methods**: Although they improve efficiency, they sacrifice precision or resolution. 3. **Advantages and bottlenecks of the POD - GP method**: - The **POD - GP method**, which combines proper orthogonal decomposition (POD) and Galerkin projection (GP), can significantly increase the running speed while maintaining high precision. - However, previous POD - GP implementations rely on MPI libraries (such as PETSc and FEniCS) and face significant runtime bottlenecks during training and inference, especially when applied to GPUs with a large number of cores. ### Solution: To solve the above problems, the author proposes **PyPOD - GP**, a PyTorch - based GPU - optimized library for chip - level thermal simulation. By leveraging PyTorch's tensor operations, PyPOD - GP achieves a higher acceleration effect than CPU - based implementations, specifically: - **Training speed improvement**: On an NVIDIA Tesla Volta GV100 GPU, PyPOD - GP achieves a training acceleration of more than 23.4 times. - **Inference speed improvement**: On the same hardware, PyPOD - GP achieves an inference acceleration of more than 10 times. - **High precision**: At the device level, the error of PyPOD - GP is only 1.2%, demonstrating its potential in large - scale GPU architectures. ### Summary: This paper aims to provide an efficient and accurate GPU - accelerated thermal simulation tool by developing the PyPOD - GP library to meet the dynamic thermal management requirements of multi - core GPUs in high - performance computing. This not only improves the efficiency of thermal simulation but also makes real - time thermal monitoring and multi - device prediction possible.