Abstract:While Large language model (LLM)-based programming assistants such as CoPilot and ChatGPT can help improve the productivity of professional software developers, they can also facilitate cheating in introductory computer programming courses. Assuming instructors have limited control over the industrial-strength models, this paper investigates the baseline performance of 5 widely used LLMs on a collection of introductory programming problems, examines adversarial perturbations to degrade their performance, and describes the results of a user study aimed at understanding the efficacy of such perturbations in hindering actual code generation for introductory programming assignments. The user study suggests that i) perturbations combinedly reduced the average correctness score by 77%, ii) the drop in correctness caused by these perturbations was affected based on their detectability.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of large language models (LLMs) assisting in cheating in introductory programming courses. Specifically: 1. **Background**: - Large language models (such as CoPilot and ChatGPT) can enhance the productivity of professional software developers but may also be used by students to cheat in introductory programming courses. - Educators have limited control over the capabilities of these industrial-grade models, necessitating methods to prevent or reduce such cheating. 2. **Research Objectives**: - Evaluate the baseline performance of 5 widely used large language models on a set of introductory programming problems. - Investigate how adversarial perturbations can reduce the performance of these models. - Through user studies, explore the effectiveness of these perturbations in real-world code generation, particularly whether students can detect and reverse these perturbations. 3. **Research Steps**: - **Step 1**: Measure the accuracy of large language models on introductory computer science programming assignments. - **Step 2**: Develop adversarial techniques to perturb programming assignment prompts and analyze the impact of these techniques on the quality of model-generated solutions. - **Step 3**: Conduct user studies to understand the potential of these perturbation techniques in preventing LLM-assisted cheating in practice, with a focus on whether students can detect and reverse these perturbations. 4. **Key Findings**: - When adversarial perturbations are used in combination, the average correctness score drops by 77%. - The effectiveness of perturbations is influenced by their detectability, with highly noticeable perturbations being less likely to be reversed. - Students employed various strategies to detect and verify LLM-generated solutions, but in most cases, they still needed to check and modify the LLM-generated code. In summary, this paper presents a systematic study and proposes an effective method to prevent large language models from assisting in cheating in introductory programming courses.

Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

LLM-Resistant Math Word Problem Generation via Adversarial Attacks

Adversarial Math Word Problem Generation

Assessing Cybersecurity Vulnerabilities in Code Large Language Models

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Generating Adversarial Computer Programs using Optimized Obfuscations

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations

Security Attacks on LLM-based Code Completion Tools

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the Literature

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice