Abstract:During early optimization passes, compilers must make predictions for machine-dependent characteristics such as execution unit utilization, number of register spills, latency, throughput etc. to generate better code. Often a hand-written static/analytical hardware cost model is built into the compiler. However, the need for more sophisticated and varied predictions has become more pronounced with the development of deep learning compilers which need to optimize dataflow graphs. Such compilers usually employ a much higher level MLIR form as an IR representation before lowering to traditional LLVM-IR. A static/analytical cost model in such a scenario is cumbersome and error prone as the opcodes represent very high level algebraic/arithmetic operations. Hence, we develop a machine learning-based cost model for high-level MLIR which can predict different target variables of interest such as CPU/GPU/xPU utilization, instructions executed, register usage etc. By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research to help predict hardware characteristics more accurately. We expect such precise ML-driven hardware cost models to guide our deep learning compiler in graph level optimizations around operator fusion, local memory allocation, kernel scheduling etc. as well as in many kernel-level optimizations such as loop interchange, LICM and unroll. We report early work-in -progress results of developing such models on high-level MLIR representing dataflow graphs emitted by Pytorch/Tensorflow-like frameworks as well as lower-level dialects like affine. We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest and can be a go-to mechanism for hardware cost modelling in the future.

Towards a high-performance AI compiler with upstream MLIR

TensorIR: an Abstraction for Automatic Tensorized Program Optimization.

AI Powered Compiler Techniques for DL Code Optimization

MLIR-based code generation for GPU tensor cores

TPU-MLIR: A Compiler For TPU Using MLIR

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang

The MLIR Transform Dialect. Your compiler is more powerful than you think

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Compiling ONNX Neural Network Models Using MLIR

OCC: An Automated End-to-End Machine Learning Optimizing Compiler for Computing-In-Memory

ACPO: AI-Enabled Compiler-Driven Program Optimization

Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

ML-driven Hardware Cost Model for MLIR

Enabling One-size-fits-all Compilation Optimization across Machine Learning Computers for Inference

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

LLM-Aided Compilation for Tensor Accelerators