Performance bottlenecks detection through microarchitectural sensitivity

Hugo Pompougnac,Alban Dutilleul,Christophe Guillon,Nicolas Derumigny,Fabrice Rastello
2024-02-24
Abstract:Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of detecting them. Techniques that measure resources utilization can offer a good understanding of a program's execution, but, due to the constraints inherent to Performance Monitoring Units (PMU) of CPUs, do not provide the relevant metrics for each use case. Another approach is to rely on a performance model to simulate the CPU behavior. Such a model makes it possible to implement any new microarchitecture-related metric. Within this framework, we advocate for implementing modeled resources as parameters that can be varied at will to reveal performance bottlenecks. This allows a generalization of bottleneck analysis that we call sensitivity analysis. We present Gus, a novel performance analysis tool that combines the advantages of sensitivity analysis and dynamic binary instrumentation within a resource-centric CPU model. We evaluate the impact of sensitivity on bottleneck analysis over a set of high-performance computing kernels.
Performance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect performance bottlenecks in modern out - of - order (OoO) CPUs and understand the root causes of program performance problems. Specifically, the paper points out that although traditional methods based on performance monitoring units (PMUs) can provide a good understanding of program execution, due to the limitations of CPU performance monitoring units, they cannot provide relevant metrics for each usage scenario. In addition, existing methods have limitations in identifying the specific resources that cause performance bottlenecks, especially in complex OoO architectures, where the interdependence between resources and the utilization of parallelism have an important impact on performance. To solve these problems, the paper proposes a new method - sensitivity analysis. This method reveals performance bottlenecks by simulating the execution of the same program under different CPU resource acceleration conditions. Accelerating a resource means changing its ability to process micro - operations (𝜇ops), such as increasing the throughput of a certain port. If accelerating a resource can significantly improve the overall performance (i.e., the global speedup ratio is positive), then this resource is considered a performance bottleneck and needs to be optimized with emphasis. The main contributions of the paper include: - Describing the challenges related to bottleneck identification, proposing a new definition based on sensitivity analysis, and showing how this technique generalizes existing methods. - Introducing a new code analysis tool named Gus, detailing its underlying performance model and how to implement sensitivity analysis. - Evaluating the effectiveness of this method on a set of high - performance computing kernels. In summary, the paper aims to provide a more general and more accurate means of detecting performance bottlenecks by introducing the sensitivity analysis method, in order to optimize the program performance on modern OoO CPUs.