Advising OpenMP Parallelization via a Graph-Based Approach with Transformers

Tal Kadosh,Nadav Schneider,Niranjan Hasabnis,Timothy Mattson,Yuval Pinter,Gal Oren

2023-05-17

Abstract:There is an ever-present need for shared memory parallelization schemes to exploit the full potential of multi-core architectures. The most common parallelization API addressing this need today is OpenMP. Nevertheless, writing parallel code manually is complex and effort-intensive. Thus, many deterministic source-to-source (S2S) compilers have emerged, intending to automate the process of translating serial to parallel code. However, recent studies have shown that these compilers are impractical in many scenarios. In this work, we combine the latest advancements in the field of AI and natural language processing (NLP) with the vast amount of open-source code to address the problem of automatic parallelization. Specifically, we propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code, given its serial version. OMPify is based on a Transformer-based model that leverages a graph-based representation of source code that exploits the inherent structure of code. We evaluated our tool by predicting the parallelization pragmas and attributes of a large corpus of (over 54,000) snippets of serial code written in C and C++ languages (Open-OMP-Plus). Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models, in terms of F1 score and accuracy. Specifically, OMPify achieves up to 90% accuracy on commonly-used OpenMP benchmark tests such as NAS, SPEC, and PolyBench. Additionally, we performed an ablation study to assess the impact of different model components and present interesting insights derived from the study. Lastly, we also explored the potential of using data augmentation and curriculum learning techniques to improve the model's robustness and generalization capabilities.

Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Machine Learning,Performance

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the problem of automatic implementation of shared memory parallelization in multi-core architectures. Specifically, the authors propose a new method called OMPify, which leverages the latest natural language processing (NLP) techniques and large-scale open-source code to automate the conversion from serial code to parallel code. #### Main Objectives: 1. **Simplify Parallel Programming**: Reduce the complexity and workload of manually writing parallel code. 2. **Improve Parallel Code Quality**: Generate high-quality parallel code by automatically detecting and predicting OpenMP directives and shared memory attributes. 3. **Surpass Existing Methods**: Achieve better performance in terms of F1 score and accuracy compared to existing general and specialized models (such as ChatGPT and PragFormer). #### Specific Research Questions: 1. **Impact of Code Representation** (RQ1): Evaluate the impact of different code modalities (such as DFG, AST, etc.) on the parallel programming assistance task (CLPP). 2. **Role of Loop Context** (RQ2): Investigate the impact of the for loop range of the input serial code on its performance in the CLPP task. 3. **Effect of Data Augmentation** (RQ3): Explore whether data augmentation techniques such as variable name replacement can enhance the performance of existing models on the CLPP task. 4. **Advantages of Multi-Label Classification** (RQ4): Compare the effectiveness of the multi-label classification problem formulation with PragFormer’s multiple binary classification problem formulation. Through these studies, the paper demonstrates the superior performance of OMPify in various benchmark tests and provides detailed experimental analysis.

Advising OpenMP Parallelization via a Graph-Based Approach with Transformers

PragFormer: Data-Driven Parallel Source Code Classification with Transformers

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization

OMPGPT: A Generative Pre-trained Transformer Model for OpenMP

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large Language Models

OpenMP Advisor

GPT-Driven Source-to-Source Transformation for Generating Compilable Parallel CUDA Code for Nussinov's Algorithm

Adaptive Multi-versioning for OpenMP Parallelization Via Machine Learning

Comprehensive Performance Modeling and System Design Insights for Foundation Models

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL

3D Parallelism for Transformers Via Integer Programming

A Comparative Analysis of Distributed Training Strategies for GPT-2

Openuh: an Optimizing, Portable Openmp Compiler

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs