Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

Anthony Dowling,Lin Jiang,Ming-Cheng Cheng,Yu Liu
2024-02-07
Abstract:Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of a multi-core CPU based on a defined set of states and their transitions. We compare the performances of a dynamic RC thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the COMBS benchmark suite to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature of 29.01%. Using a set of 8 benchmarks, the comparison of the two algorithms shows a decrease of 29.57% in the peak spatial variance of the chip temperature and 26.26% in the peak chip temperature. We also identify several potential future research directions.
Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the serious thermal management issues that modern real - time systems generate when performing computationally - intensive tasks. Specifically, as the complexity of modern embedded chips increases, they consume a large amount of power and generate significant heat when running tasks with high computational requirements. This leads to serious thermal problems such as a sharp rise in temperature, high thermal gradients, and hot - spot formation, which may degrade chip performance, accelerate device aging, and even cause premature failure. To address these challenges, the author proposes a new Thermal - Aware Scheduling (TAS) algorithm - POD - TAS. This algorithm aims to maintain a safe thermal state by optimizing heat dissipation, thereby avoiding hardware damage and ensuring the timely completion of real - time tasks. Unlike traditional steady - state thermal models, POD - TAS uses a reduced - order Proper Orthogonal Decomposition (POD) thermal model, which can dynamically predict the transient thermal behavior of the CPU and provide temperature information with high spatiotemporal resolution. ### Main Problem Summary: 1. **Thermal Problems**: Modern multi - core CPUs generate a large amount of heat when performing computationally - intensive tasks, resulting in a sharp rise in temperature and other heat - related problems. 2. **Limitations of Existing Methods**: - Steady - state thermal models cannot capture transient thermal behavior, which may lead to inaccurate scheduling decisions. - The high - precision Direct Numerical Simulation (DNS) method has a high computational cost and is not suitable for real - time scheduling. - Although the RC circuit model is efficient, its accuracy is limited, especially in transient thermal prediction. 3. **Solution**: Develop a POD - based dynamic thermal model and the corresponding TAS algorithm (POD - TAS) to achieve efficient thermal management and real - time task scheduling. ### Specific Objectives: - **Improve Thermal Management Efficiency**: By introducing the POD thermal model, improve the accuracy of thermal prediction while maintaining computational efficiency. - **Optimize Task Scheduling**: Use the POD - TAS algorithm to allocate tasks according to the transient thermal behavior of the CPU, ensuring that the chip temperature is within a safe range. - **Verify Performance**: Verify the advantages of POD - TAS over existing algorithms (such as RT - TAS) through experiments, especially in reducing peak temperature and thermal variance. ### Key Contributions: 1. A new TAS algorithm (POD - TAS) has been implemented, which is based on the POD thermal model and can provide temperature information with high spatiotemporal resolution. 2. The accuracy of the HotSpot RC circuit simulator and the POD thermal model has been compared, and the results show that the POD model is closer to the results of FEM DNS. 3. A simulation - based evaluation method has been proposed to compare the performance of different TAS algorithms. 4. The effectiveness of POD - TAS in reducing peak temperature and thermal variance has been verified through experiments. In summary, the main purpose of this paper is to solve the thermal management problems faced by modern multi - core CPUs when performing computationally - intensive tasks by introducing a new thermal - aware scheduling algorithm (POD - TAS), thereby improving the reliability and performance of the system.