An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

Benyi Xie,Yue Yan,Chenghao Yan,Sicheng Tao,Zhuangzhuang Zhang,Xinyu Li,Yanzhi Lan,Xiang Wu,Tianyi Liu,Tingting Zhang,Fuxin Zhang

DOI: https://doi.org/10.1145/3640813

IF: 1.444

2024-01-15

ACM Transactions on Architecture and Code Optimization

Abstract:Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, especially when translating from complex instruction set computer (CISC) to reduced instruction set computer (RISC). For computational workloads, the main overhead stems from translated code quality. Experimental data show state-of-the-art DBT products have dynamic code inflation of at least 1.46. This indicates on average over 1.46 host instructions are needed to emulate one guest instruction. Worse, inflation closely correlates with translated code quality. However, the detailed sources of instruction inflation remain unclear. To understand the sources of inflation, we present Deflater, an instruction inflation analysis framework comprising a mathematical model, a collection of black-box unit tests called BenchMIAOes, and a trace-based simulator called InflatSim. The mathematical model calculates overall inflation based on the inflation of individual instructions and translation block (TB) optimizations. BenchMIAOes extract model parameters from DBTs without accessing DBT source code. InflatSim implements the model and uses the extracted parameters from BenchMIAOes to simulate a given DBT’s behavior. Deflater is a valuable tool to guide DBT analysis and improvement. Using Deflater, we simulated inflation for three state-of-the-art CISC-to-RISC DBTs: ExaGear, Rosetta2, and LATX, with inflation errors of 5.63%, 5.15%, and 3.44% respectively for SPEC CPU 2017, gaining insights into these commercial DBTs. Deflater also efficiently models inflation for the open-source DBT QEMU and suggests optimizations that can substantially reduce inflation. Implementing the suggested optimizations confirms Deflater’s effective guidance, with 4.65% inflation error, and gains 5.47x performance improvement.

computer science, theory & methods, hardware & architecture

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issue of code bloat in Dynamic Binary Translators (DBTs) when migrating applications between different Instruction Set Architectures (ISAs). Specifically: 1. **Analysis of Code Bloat Phenomenon**: - The paper proposes a framework called Delater to analyze the code bloat problem that occurs when DBTs translate one ISA (such as Complex Instruction Set Computer, CISC) into another (such as Reduced Instruction Set Computer, RISC). - The study finds that even advanced DBT products require, on average, more than 1.46 host instructions to emulate a single guest instruction during the translation process, leading to significant performance overhead. 2. **Analysis of Code Bloat Sources**: - To gain a deeper understanding of the sources of code bloat, the paper introduces a mathematical model, a set of black-box unit tests (called BenchMIAOes), and a trace-based simulator (called InlatSim). - These tools can extract detailed parameters during the DBT translation process and simulate the behavior of a given DBT, thereby revealing the specific causes of code bloat. 3. **Analysis and Optimization of Commercial DBTs**: - The Delater framework was used to simulate and analyze three commercial DBTs (ExaGear, Rosetta2, and LATX), with errors of 5.63%, 5.15%, and 3.44%, respectively. - Additionally, the open-source DBT QEMU was optimized, achieving a bloat error of 4.65% and a performance improvement of 5.47 times. Through these studies, the paper hopes to guide further improvements and optimizations of DBTs to reduce the performance loss caused by code bloat.

An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

A Hardware Non-Invasive Mapping Method for Condition Bits in Binary Translation

SPC-Indexed Indirect Branch Hardware Cache Redirecting Technique in Binary Translation

Research of dictionary compression scheme based on ARMv4 instruction set

Performance Improvements via Peephole Optimization in Dynamic Binary Translation

FPGA based hardware-software co-designed dynamic binary translation system

An approach to minimizing the interpretation overhead in Dynamic Binary Translation

More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization

CrossDBT: An LLVM-Based User-Level Dynamic Binary Translation Emulator

Improving Dynamically-Generated Code Performance on Dynamic Binary Translators

A Light-weight Code Cache Design for Dynamic Binary Translation

Using Pcache to Speedup Interpretation in Dynamic Binary Translation.

Cross-ISA machine instrumentation using fast and scalable dynamic binary translation

GSM: An Efficient Code Generation Algorithm for Dynamic Binary Translator

On Static Binary Translation of ARM/Thumb Mixed ISA Binaries

Unleashing the Power of Learning: an Enhanced Learning-Based Approach for Dynamic Binary Translation.

Improving SIMD Parallelism via Dynamic Binary Translation

Level Based Binary Translation System Back-end Instruction Indexing Strategy

Condition code optimization in dynamic binary translation

Efficient Binary Translation System with Low Hardware Cost

A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules