Abstract:Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by black - box optimization in compiler autotuning (autotuning), especially the deficiencies in the application of Bayesian Optimization (BO) in this field. Specifically: 1. **Lack of standardized benchmark tests**: Existing research lacks a standard and diverse benchmark test suite to evaluate and compare different optimization algorithms, especially in the field of compiler autotuning. 2. **Complex optimization problems**: Compiler autotuning involves parameters of discrete, conditional, permutation and other types, as well as known and unknown binary constraints, all of which increase the complexity of optimization. In addition, multi - fidelity and multi - objective evaluations also bring additional challenges. 3. **Limitations of existing methods**: Existing Bayesian optimization methods and other optimization algorithms often do not fully consider the above - mentioned complexity and special requirements when dealing with compiler autotuning. To solve these problems, the paper proposes **CATBench**, which is a comprehensive benchmark test suite aiming to capture the complexity of compiler autotuning and provide a unified interface to evaluate Bayesian optimization algorithms. CATBench includes a variety of machine - learning - oriented computing tasks from tensor algebra to image processing and clustering, and uses state - of - the - art compilers (such as TACO and RISE/ELEVATE). Through CATBench, researchers can evaluate and improve Bayesian optimization algorithms more effectively, thus promoting innovation and development in the field of compiler autotuning. ### Main contributions 1. **Comprehensive benchmark test suite**: It contains ten real - world compiler optimization tasks, covering mixed discrete, categorical and permutation search spaces, including known and unknown binary constraints, and supports multi - objective and multi - fidelity information sources. 2. **Easy - to - use framework**: It provides a simple interface for easy prototyping with surrogate models and running large - scale experiments on server clusters. 3. **Extensive evaluation**: It has thoroughly evaluated popular Bayesian optimization methods and evolutionary algorithms, demonstrated the characteristics of the benchmark test suite, and made necessary adaptations according to the requirements of CATBench. Through these contributions, CATBench is expected to become an important tool for compiler autotuning and Bayesian optimization research, promoting further development in this field.

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Efficient Compiler Autotuning via Bayesian Optimization

Towards a Benchmarking Suite for Kernel Tuners

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

Compiler Autotuning through Multiple Phase Learning

Compiler Auto-tuning through Multiple Phase Learning

BenchDirect: A Directed Language Model for Compiler Benchmarks

Boosting Compiler Testing via Compiler Optimization Exploration

Compiler Auto-Tuning Via Critical Flag Selection.

Analyzing Search Techniques for Autotuning Image-based GPU Kernels: The Impact of Sample Sizes

BenchPress: A Deep Active Benchmark Generator

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization

CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?

LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions

Large-scale Benchmarking of Metaphor-based Optimization Heuristics

CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

On the Assessment of Benchmark Suites for Algorithm Comparison

Codabench: Flexible, Easy-to-use, and Reproducible Meta-Benchmark Platform

COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite