Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations

Li Tan,Zizhong Chen
DOI: https://doi.org/10.48550/arXiv.1411.2536
2015-04-02
Abstract:The pressing demands of improving energy efficiency for high performance scientific computing have motivated a large body of software-controlled hard- ware solutions using Dynamic Voltage and Frequency Scaling (DVFS) that strategically switch processors to low-power states, when the peak processor performance is not necessary. Although OS level solutions have demonstrated the effectiveness of saving energy in a black-box fashion, for applications with variable execution characteristics, the optimal energy efficiency can be blundered away due to defective prediction mechanism and untapped load imbalance. In this paper, we propose TX, a library level race-to-halt DVFS scheduling approach that analyzes Task Dependency Set of each task in parallel Cholesky, LU, and QR factorizations to achieve substantial energy savings OS level solutions cannot fulfill. Partially giving up the generality of OS level solutions per requiring library level source modification, TX lever- ages algorithmic characteristics of the applications to gain greater energy savings. Experimental results on two power-aware clusters indicate that TX can save up to 17.8% more energy than state-of-the-art OS level solutions with negligible 3.5% on average performance loss.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the energy efficiency of parallel Cholesky, LU and QR factorizations through Dynamic Voltage and Frequency Scaling (DVFS) technology in high - performance scientific computing while maintaining a relatively low performance loss. Specifically, the paper points out that the current operating - system - level (OS level) energy - saving schemes cannot fully utilize the energy - saving potential due to the flaws in the prediction mechanism and the problem of load imbalance. Therefore, the authors propose a library - level energy - saving scheduling method TX based on Task Dependency Set (TDS) analysis, aiming to achieve more effective energy savings through algorithm characteristics. The main contributions of the paper are as follows: 1. **Energy - saving effect**: Compared with the existing operating - system - level solutions, TX can save significantly more energy, especially when dealing with applications with random/variable execution characteristics such as parallel Cholesky, LU and QR factorizations. 2. **Utilization of algorithm characteristics**: TX analyzes the task dependency set of each task and uses algorithm characteristics to achieve higher energy efficiency, rather than relying solely on general - purpose prediction mechanisms. 3. **Low performance loss**: The experimental results show that while achieving significant energy savings, TX has an average performance loss of only 3.5%. 4. **Formal proof**: The paper also formally proves that under the current CMOS technology, TX is comparable to the Critical Path (CP) method in terms of energy - saving ability. In conclusion, this paper aims to overcome the limitations of the existing operating - system - level energy - saving schemes through a new library - level scheduling method TX, thereby achieving more efficient energy utilization in high - performance scientific computing.