Abstract:The increasing complexity of deep learning models necessitates specialized hardware and software optimizations, particularly for deep learning accelerators. Existing autotuning methods often suffer from prolonged tuning times due to profiling invalid configurations, which can cause runtime errors. We introduce ML$^2$Tuner, a multi-level machine learning tuning technique that enhances autotuning efficiency by incorporating a validity prediction model to filter out invalid configurations and an advanced performance prediction model utilizing hidden features from the compilation process. Experimental results on an extended VTA accelerator demonstrate that ML$^2$Tuner achieves equivalent performance improvements using only 12.3% of the samples required with a similar approach as TVM and reduces invalid profiling attempts by an average of 60.8%, Highlighting its potential to enhance autotuning performance by filtering out invalid configurations

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the hardware and software optimization challenges brought by the increasing complexity of deep - learning models, especially the autotuning problem for deep - learning accelerators. Specifically, existing autotuning methods usually lead to overly long tuning times or even runtime errors due to invalid configurations. These problems not only reduce the tuning efficiency but also affect the performance of the final optimized code. To solve these problems, the paper proposes ML2Tuner, a multi - level machine - learning tuning technique. ML2Tuner filters out invalid configurations by introducing a Validity Prediction Model and improves the Performance Prediction Model by using the hidden features extracted during the compilation process. This can significantly reduce the number of attempts of invalid configurations, improving the tuning efficiency and the performance of the final code. ### Main Problem Summary: 1. **Increasing Complexity of Deep - Learning Models**: It leads to a significant increase in computing and memory requirements, necessitating more efficient hardware and software optimization. 2. **Limitations of Existing Autotuning Methods**: - Invalid configurations lead to extended tuning times. - Invalid configurations may cause runtime errors, further reducing the tuning efficiency. 3. **Hardware Diversity**: Different hardware platforms and architectures increase the difficulty and uncertainty of tuning. ### Main Contributions of ML2Tuner: - **Validity Prediction Model (V)**: It is used to predict the validity of configurations and avoid invalid configurations from entering the subsequent tuning process. - **Advanced Performance Prediction Model (A)**: It uses the hidden features extracted during the compilation process to improve the accuracy of performance prediction. - **Reduction in Invalid Configuration Attempts**: Experimental results show that ML2Tuner can achieve a performance improvement comparable to that of TVM while using fewer samples (only 12.3% of the samples required by TVM) and reduce invalid configuration attempts by 60.8%. Through these improvements, ML2Tuner can perform autotuning more efficiently, especially on deep - learning accelerators, thereby improving the overall optimization effect and performance in practical applications.

ML$^2$Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models

HAOTuner: A Hardware Adaptive Operator Auto-Tuner for Dynamic Shape Tensor Compilers

MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks

FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers

Compiler Autotuning through Multiple Phase Learning

Compiler Auto-tuning through Multiple Phase Learning

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

MindOpt Tuner: Boost the Performance of Numerical Software by Automatic Parameter Tuning

FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning with Partitioning and Parallelism of Search Space

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

Multi-Fidelity Automatic Hyper-Parameter Tuning Via Transfer Series Expansion.

LLMTune: Accelerate Database Knob Tuning with Large Language Models

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration

Auto-tuning Fixed-point Precision with TVM on RISC-V Packed SIMD Extension

An Optimization Toolchain Design Of Deep Learning Deployment Based On Heterogeneous Computing Platform