An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

Benyi Xie,Yue Yan,Chenghao Yan,Sicheng Tao,Zhuangzhuang Zhang,Xinyu Li,Yanzhi Lan,Xiang Wu,Tianyi Liu,Tingting Zhang,Fuxin Zhang
DOI: https://doi.org/10.1145/3640813
IF: 1.444
2024-01-15
ACM Transactions on Architecture and Code Optimization
Abstract:Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, especially when translating from complex instruction set computer (CISC) to reduced instruction set computer (RISC). For computational workloads, the main overhead stems from translated code quality. Experimental data show state-of-the-art DBT products have dynamic code inflation of at least 1.46. This indicates on average over 1.46 host instructions are needed to emulate one guest instruction. Worse, inflation closely correlates with translated code quality. However, the detailed sources of instruction inflation remain unclear. To understand the sources of inflation, we present Deflater, an instruction inflation analysis framework comprising a mathematical model, a collection of black-box unit tests called BenchMIAOes, and a trace-based simulator called InflatSim. The mathematical model calculates overall inflation based on the inflation of individual instructions and translation block (TB) optimizations. BenchMIAOes extract model parameters from DBTs without accessing DBT source code. InflatSim implements the model and uses the extracted parameters from BenchMIAOes to simulate a given DBT’s behavior. Deflater is a valuable tool to guide DBT analysis and improvement. Using Deflater, we simulated inflation for three state-of-the-art CISC-to-RISC DBTs: ExaGear, Rosetta2, and LATX, with inflation errors of 5.63%, 5.15%, and 3.44% respectively for SPEC CPU 2017, gaining insights into these commercial DBTs. Deflater also efficiently models inflation for the open-source DBT QEMU and suggests optimizations that can substantially reduce inflation. Implementing the suggested optimizations confirms Deflater’s effective guidance, with 4.65% inflation error, and gains 5.47x performance improvement.
computer science, theory & methods, hardware & architecture
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of code bloat in Dynamic Binary Translators (DBTs) when migrating applications between different Instruction Set Architectures (ISAs). Specifically: 1. **Analysis of Code Bloat Phenomenon**: - The paper proposes a framework called Delater to analyze the code bloat problem that occurs when DBTs translate one ISA (such as Complex Instruction Set Computer, CISC) into another (such as Reduced Instruction Set Computer, RISC). - The study finds that even advanced DBT products require, on average, more than 1.46 host instructions to emulate a single guest instruction during the translation process, leading to significant performance overhead. 2. **Analysis of Code Bloat Sources**: - To gain a deeper understanding of the sources of code bloat, the paper introduces a mathematical model, a set of black-box unit tests (called BenchMIAOes), and a trace-based simulator (called InlatSim). - These tools can extract detailed parameters during the DBT translation process and simulate the behavior of a given DBT, thereby revealing the specific causes of code bloat. 3. **Analysis and Optimization of Commercial DBTs**: - The Delater framework was used to simulate and analyze three commercial DBTs (ExaGear, Rosetta2, and LATX), with errors of 5.63%, 5.15%, and 3.44%, respectively. - Additionally, the open-source DBT QEMU was optimized, achieving a bloat error of 4.65% and a performance improvement of 5.47 times. Through these studies, the paper hopes to guide further improvements and optimizations of DBTs to reduce the performance loss caused by code bloat.