Abstract:In the contemporary landscape of computer architecture, the demand for efficient parallel programming persists, needing robust optimization techniques. Traditional optimizing compilers have historically been pivotal in this endeavor, adapting to the evolving complexities of modern software systems. The emergence of Large Language Models (LLMs) raises intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies. This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers, assessing their respective abilities and limitations in optimizing code for maximum efficiency. Additionally, we introduce a benchmark suite of challenging optimization patterns and an automatic mechanism for evaluating performance and correctness of the code generated by such tools. We used two different prompting methodologies to assess the performance of the LLMs -- Chain of Thought (CoT) and Instruction Prompting (IP). We then compared these results with three traditional optimizing compilers, CETUS, PLUTO and ROSE, across a range of real-world use cases. A key finding is that while LLMs have the potential to outperform current optimizing compilers, they often generate incorrect code on large code sizes, calling for automated verification methods. Our extensive evaluation across 3 different benchmarks suites shows CodeLlama-70B as the superior optimizer among the two LLMs, capable of achieving speedups of up to 2.1x. Additionally, CETUS is the best among the optimizing compilers, achieving a maximum speedup of 1.9x. We also found no significant difference between the two prompting methods: Chain of Thought (Cot) and Instructing prompting (IP).

A Comparative Study of Programming Languages in Rosetta Code

A Metrics-Based Comparative Study on Object-Oriented Programming Languages.

Comparing Selected Criteria of Programming Languages Java, PHP, C++, Perl, Haskell, AspectJ, Ruby, COBOL, Bash Scripts and Scheme Revision 1.0 - a Team CPLgroup COMP6411-S10 Term Report

Comparative Studies of Six Programming Languages

Performance Comparison of Programming Languages to Analyze Big Data Sets

Towards Comparing Programming Paradigms

Challenges in Comparing Code Maintainability across Different Programming Languages

Is Fortran Still Relevant? Comparing Fortran with Java and C++

On the Impact of Programming Languages on Code Quality

Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers

Measuring source code conciseness across programming languages using compression

A Study of Bug Resolution Characteristics in Popular Programming Languages

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Studying the difference between natural and programming language corpora

A Comparative Case Study of Code Reuse With Language Oriented Programming

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages

Comparing large language models and human programmers for generating programming code

Linguistic Relativity and Programming Languages

Rosy: An elegant language to teach the pure reactive nature of robot programming

Replacing ANSI C with other modern programming languages

Comparative study of Java and Python: A Review