Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

William Zhang,Maria Leon,Ryan Xu,Adrian Cardenas,Amelia Wissink,Hanna Martin,Maya Srikanth,Kaya Dorogi,Christian Valadez,Pedro Perez,Citlalli Grijalva,Corey Zhang,Mark Santolucito

2024-09-02

Abstract:Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for code generation for visual node-based programming languages is still an open question. In particular, such languages have multiple levels of representation in text, each of which may be used for code generation. In this work, we explore the performance of LLM code generation in audio programming tasks in visual programming languages at multiple levels of representation. We explore code generation through metaprogramming code representations for these languages (i.e., coding the language using a different high-level text-based programming language), as well as through direct node generation with JSON. We evaluate code generated in this way for two visual languages for audio programming on a benchmark set of coding problems. We measure both correctness and complexity of the generated code. We find that metaprogramming results in more semantically correct generated code, given that the code is well-formed (i.e., is syntactically correct and runs). We also find that prompting for richer metaprogramming using randomness and loops led to more complex code.

Software Engineering,Artificial Intelligence,Computation and Language,Programming Languages

What problem does this paper attempt to address?

The paper attempts to address the problem of evaluating the performance of large language models (LLMs) in generating visual data flow language code for audio programming. Specifically, the paper focuses on how to optimize code generation strategies through different levels of code representation (such as direct node generation with JSON, and through metaprogramming code representation) to improve the correctness and complexity of the generated code. The paper primarily studies two visual data flow programming languages—MaxMSP and its Python library MaxPy, as well as the Web Audio API and its accompanying visual programming language Wavir. The main contributions of the paper include: 1. Proposing a benchmark set for audio digital signal processing (DSP) and using it to evaluate the performance of LLM code generation. 2. Conducting 600 code generation experiments in two different audio programming languages, covering three different levels of code representation. 3. Defining a metric to measure the semantic correctness of LLM-generated code. 4. Comparative analysis of direct JSON generation and metaprogramming in visual data flow languages, finding that metaprogramming performs better in generating more semantically correct code. Through these studies, the paper aims to explore how to better utilize LLMs to assist non-professional programmers in creating within their preferred programming environments, especially in visual data flow programming languages, which are more user-friendly for those without a deep programming background.

Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

From Code to Play: Benchmarking Program Search for Games Using Large Language Models

On Evaluating the Efficiency of Source Code Generated by LLMs

CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

Planning-Driven Programming: A Large Language Model Programming Workflow

A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama

CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

WavJourney: Compositional Audio Creation with Large Language Models

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent

Performance-Aligned LLMs for Generating Fast Code

Escalating LLM-based Code Translation Benchmarking into the Class-level Era

Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts

LLM-Assisted Code Cleaning For Training Accurate Code Generators

A Performance Study of LLM-Generated Code on Leetcode

SynCode: LLM Generation with Grammar Augmentation

How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark