Abstract:The pressing demands of improving energy efficiency for high performance scientific computing have motivated a large body of software-controlled hard- ware solutions using Dynamic Voltage and Frequency Scaling (DVFS) that strategically switch processors to low-power states, when the peak processor performance is not necessary. Although OS level solutions have demonstrated the effectiveness of saving energy in a black-box fashion, for applications with variable execution characteristics, the optimal energy efficiency can be blundered away due to defective prediction mechanism and untapped load imbalance. In this paper, we propose TX, a library level race-to-halt DVFS scheduling approach that analyzes Task Dependency Set of each task in parallel Cholesky, LU, and QR factorizations to achieve substantial energy savings OS level solutions cannot fulfill. Partially giving up the generality of OS level solutions per requiring library level source modification, TX lever- ages algorithmic characteristics of the applications to gain greater energy savings. Experimental results on two power-aware clusters indicate that TX can save up to 17.8% more energy than state-of-the-art OS level solutions with negligible 3.5% on average performance loss.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the energy efficiency of parallel Cholesky, LU and QR factorizations through Dynamic Voltage and Frequency Scaling (DVFS) technology in high - performance scientific computing while maintaining a relatively low performance loss. Specifically, the paper points out that the current operating - system - level (OS level) energy - saving schemes cannot fully utilize the energy - saving potential due to the flaws in the prediction mechanism and the problem of load imbalance. Therefore, the authors propose a library - level energy - saving scheduling method TX based on Task Dependency Set (TDS) analysis, aiming to achieve more effective energy savings through algorithm characteristics. The main contributions of the paper are as follows: 1. **Energy - saving effect**: Compared with the existing operating - system - level solutions, TX can save significantly more energy, especially when dealing with applications with random/variable execution characteristics such as parallel Cholesky, LU and QR factorizations. 2. **Utilization of algorithm characteristics**: TX analyzes the task dependency set of each task and uses algorithm characteristics to achieve higher energy efficiency, rather than relying solely on general - purpose prediction mechanisms. 3. **Low performance loss**: The experimental results show that while achieving significant energy savings, TX has an average performance loss of only 3.5%. 4. **Formal proof**: The paper also formally proves that under the current CMOS technology, TX is comparable to the Critical Path (CP) method in terms of energy - saving ability. In conclusion, this paper aims to overcome the limitations of the existing operating - system - level energy - saving schemes through a new library - level scheduling method TX, thereby achieving more efficient energy utilization in high - performance scientific computing.

Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations

User- and Process-Driven Dynamic Voltage and Frequency Scaling.

Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures

Energy-Aware Non-Preemptive Task Scheduling with Deadline Constraint in DVFS-Enabled Heterogeneous Clusters

Energy Efficient Real-Time Task Scheduling on CPU-GPU Hybrid Clusters

Enhanced Parallel Application Scheduling Algorithm with Energy Consumption Constraint in Heterogeneous Distributed Systems

Energy-aware Task Scheduling with Deadline Constraint in DVFS-enabled Heterogeneous Clusters

A Task Level-Aware Scheduling Algorithm for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems

Energy-Efficient Real-Time Task Allocation in a Data Center.

Low-Energy Kernel Scheduling Approach for Energy Saving.

Dynamic Voltage and Frequency Scaling for Scientific Applications

A DVFS Based Energy-Efficient Tasks Scheduling in a Data Center

Towards Energy Efficient Scheduling for Online Tasks in Cloud Data Centers Based on DVFS

EEWA: Energy-Efficient Workload-Aware Task Scheduling in Multi-core Architectures

Analyzing and Optimizing Energy Efficiency of Algorithms on DVS Systems a First Step Towards Algorithmic Energy Minimization

Some Observations on Optimal Frequency Selection in DVFS-based Energy Consumption Minimization

A survey on software methods to improve the energy efficiency of parallel computing

Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems

Energy-efficient Task Scheduling on Heterogeneous Computing Systems by Linear Programming

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Energy-Aware Loop Scheduling and Assignment for Multi-Core, Multi-Functional-Unit Architecture