Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
Guillaume Raffin,Denis Trystram
2024-07-19
Abstract:Every day, we experience the effects of the global warming: extreme weather events, major forest fires, storms, global warming, etc.The scientific community acknowledges that this crisis is a consequence of human activities where Information and Communications Technologies (ICT) are an increasingly important contributor.Computer scientists need tools for measuring the footprint of the code they produce and for optimizing it. Running Average Power Limit (RAPL) is a low-level interface designed by Intel that provides a measure of the energy consumption of a CPU (and more) without the need for additional hardware. Since 2017, it is available on most computing devices, including non-Intel devices such as AMD processors.More and more people are using RAPL for energy measurement, mostly like a black box without deep knowledge of its behavior.Unfortunately, this causes mistakes when implementing measurement <a class="link-external link-http" href="http://tools.In" rel="external noopener nofollow">this http URL</a> this paper, we propose to come back to the basic mechanisms that allow to use RAPL measurements and present a critical analysis of their operations. In addition to long-established mechanisms, we explore the suitability of the recent eBPF technology (formerly and abbreviation for extended Berkeley Packet Filter) for working with RAPL.For each mechanism, we release an implementation in Rust that avoids the pitfalls we detected in existing tools, improving correctness, timing accuracy and performance. These new implementations have desirable properties for monitoring and profiling parallel applications.We also provide an experimental study with multiple benchmarks and processor models (Intel and AMD) in order to evaluate the efficiency of the various mechanisms and their impact on parallel software.These experiments show that no mechanism provides a significant performance advantage over the others. However, they differ significantly in terms of ease-of-use and resiliency.We believe that this work will help the community to develop correct, resilient and lightweight measurement tools.
Distributed, Parallel, and Cluster Computing,Performance