Abstract:MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR's high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth -- a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness \revi{by raising C programs} to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows for the C programming language. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of automatically converting existing low-level programming language code to a high-level intermediate representation (IR) to leverage modern hardware acceleration. Specifically, the paper proposes a new method called mlirSynth, which can automatically elevate lower-level MLIR (Multi-Level IR) dialects to higher-level dialects without manually defining rules. The paper points out that with the diversification of modern hardware, such as Tensor Processing Units (TPUs) and accelerators designed specifically for artificial intelligence, these hardware provide efficient performance but also increase programming complexity. To solve this problem, researchers have developed domain-specific languages (DSLs) that can simplify problem descriptions and efficiently map to different hardware accelerators through domain-specific compilers. MLIR, as an emerging compiler infrastructure, supports high-level program representation, but programs written in existing low-level languages find it difficult to directly utilize MLIR's high-performance compilation capabilities. The key contributions of the paper include: 1. **mlirSynth framework**: A method for automatically elevating lower-level MLIR dialects to higher-level dialects without the need for hard-coded compiler transformations or elevation rules. 2. **Scalable code synthesis method**: Automatically generates the search space based on MLIR's TableGen definitions, thereby synthesizing code across multiple MLIR dialects. 3. **Fast bottom-up enumeration search synthesizer**: Quickly prunes the search space using type constraints and input-output behavior equivalence. 4. **Higher coverage, performance, and accuracy compared to existing elevation methods**. The paper demonstrates the method by elevating C language programs to two different high-level MLIR dialects (Linalg IR and HLO IR), enabling the use of existing optimization compilation flows for these high-level dialects. Experimental results show that mlirSynth can cover more programs and generate more efficient implementations than existing techniques. For example, in the Polybench benchmark, mlirSynth achieved an average speedup of 2.5 times (Intel platform) and 3.4 times (AMD platform) compared to the LLVM-O3 compiler. Additionally, since mlirSynth can elevate programs to the HLO dialect, it can also leverage the XLA compiler for Tensor Processing Unit (TPU) compilation, achieving an average speedup of 21.6 times.

mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

Experiences Building an MLIR-based SYCL Compiler

The MLIR Transform Dialect. Your compiler is more powerful than you think

Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices

DSP-MLIR: A MLIR Dialect for Digital Signal Processing

A Systematic Translation Validation Framework for MLIR-Based Compilers

Towards a high-performance AI compiler with upstream MLIR

Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

MimIR: An Extensible and Type-Safe Intermediate Representation for the DSL Age

HECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis Methodologies.

Relay: A High-Level Compiler for Deep Learning

Fuzzing MLIR Compilers with Custom Mutation Synthesis

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang

Program Synthesis using Natural Language

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

MLIRSmith: Random Program Generation for Fuzzing MLIR Compiler Infrastructure

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

HLSPilot: LLM-based High-Level Synthesis

Code Translation with Compiler Representations

Towards Automated Verification of LLM-Synthesized C Programs