CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

Théophile Bastian,Hugo Pompougnac,Alban Dutilleul,Fabrice Rastello
2024-02-22
Abstract:A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled solution to evaluate code analyzers on C-level benchmarks composed of a benchmark derivation procedure that feeds an evaluation harness. We conclude that memory-carried data dependencies are a major source of imprecision for these tools. We tackle this issue with staticdeps, a static analyzer extracting memory-carried data dependencies, including across loop iterations, from an assembly basic block. We integrate its output to uiCA, a state-of-the-art code analyzer, to evaluate staticdeps' impact on a code analyzer's precision through CesASMe.
Performance
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper is mainly dedicated to solving the accuracy problems encountered by code analyzers when predicting the throughput of computing kernels, especially the challenges related to memory - carried dependencies. Specifically: 1. **Diversity and limitations of code analyzers**: - Many code analysis tools (such as IACA, uiCA, llvm - mca, and Ithemal) attempt to statically predict the throughput of computing kernels, but they are based on different simplified CPU models and have limitations when dealing with basic blocks. - Evaluating the advantages and disadvantages of these tools is very important for guiding their use and improvement. 2. **Challenges of memory - carried dependencies**: - Memory - carried data dependencies are one of the main reasons for the inaccurate predictions of these tools. Such dependencies are difficult to model, especially in the case of cross - loop iterations. - Existing code analyzers have difficulties in dealing with memory dependencies, resulting in inaccurate prediction results. 3. **Proposing solutions**: - The author proposes a tool named **CesASMe** for evaluating the performance of code analyzers on C - level benchmarks. CesASMe includes a benchmark derivation process that can generate micro - benchmarks and perform evaluations. - To address the problem of memory - carried dependencies, the author also develops a static analyzer named **staticdeps**, which can extract memory - carried data dependencies, including cross - loop iteration cases. - The output of staticdeps is integrated into uiCA to evaluate its impact on the accuracy of the code analyzer. Through these methods, the author aims to improve the accuracy of code analyzers in dealing with memory dependencies and provide more reliable results for performance prediction. ### Formulas involved - **Relative error formula**: \[ \text{err} = \left| \frac{C_{\text{pred}} - C_{\text{baseline}}}{C_{\text{baseline}}} \right| \] where \( C_{\text{pred}} \) is the predicted number of cycles and \( C_{\text{baseline}} \) is the number of cycles measured at the baseline. - **Lifted prediction formula**: \[ \text{lifted\_pred}(K) = \sum_{b \in \text{BBs}(K)} \text{occurrences}(b) \times \text{pred}(b) \] where \( K \) is the kernel, \( \text{BBs}(K) \) is the set of basic blocks in the kernel, \( \text{occurrences}(b) \) is the number of times the basic block \( b \) appears, and \( \text{pred}(b) \) is the predicted throughput of the basic block \( b \). Through these formulas, the author can quantify and compare the prediction accuracy of different code analyzers.