Abstract:The emergence of new instruction set architectures (ISAs) poses challenges in ensuring compatibility with legacy applications. Dynamic binary translation (DBT) serves as a crucial approach for achieving cross-ISA compatibility, enabling legacy applications to run compatibly with cross-ISAs. However, software-based translation encounters significant performance overhead, including substantial memory access and insufficient exploitation of target architecture features. The significant performance overhead challenges hinder the practical implementation of DBT. In this paper, we investigate a novel peephole optimization approach. First, we perform peephole analysis to identify redundant memory access and suboptimal instruction sequences. Next, we leverage live variable analysis to eliminate redundant memory-access instructions. Additionally, we bridge the gaps between cross-ISAs by exploiting ISA-specific features through instruction fusion. Finally, we implement the proposed optimization design using the open-source QEMU and extensively evaluate it on both ARM64 and SW64 platforms. The experimental results reveal that SPEC2006 benchmark effectively gets a maximum performance speedup of 1.52×, alongside a reduction in code size of up to 13.98%. These results affirm the effectiveness of our optimization approach in DBT performance and code sizes.

Automatic Instruction-Set Extension for Bitwise Operation-Intensive Applications

Implementation of Bit-Parallel Multiplier over Finite Field

Bit-Level Transformation and Optimization for Hardware Synthesis of Algorithmic Descriptions

Optimizing the SIMD Parallelism Through Bitwidth Analysis

Bit-Level Optimization For High-Level Synthesis And Fpga-Based Acceleration

Application specific instruction generation based on data flow graph

Instruction Fusion Technology for the MIPS

An Efficient Approach to Custom Instruction Set Generation.

FPGA based hardware-software co-designed dynamic binary translation system

General Vector Instruction Extension for GF(2<sup>m</sup>) Polynomial Operation in Post-quantum Cryptography

Efficient custom instruction generation based on characterizing of basic blocks

Method and Implementation of SIMD Instruction Set Extension for AES Algorithm

Instruction-level Hardware/software Partition Through DFG Exploration

An Agile Instruction Set Extension Method Based on the RISC-V Processor

Improving SIMD Parallelism via Dynamic Binary Translation

A Fully Pipelined Reconfigurable Montgomery Modular Multiplier Supporting Variable Bit-Widths

A Video Specific Instruction Set Architecture for Asip Design

Operation of Super Long Integers in Cryptographic Applications

Optimized Design of a Double-Precision Floating-Point Multiply-Add-dused Unit for Data Dependence.

Instruction Extension Ensuring Time Constraints in Real Time Processor

Performance Improvements via Peephole Optimization in Dynamic Binary Translation