Abstract:Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. Why is this the case? This paper explores two competing hypotheses about the cause of student-LLM miscommunication: (1) students simply lack the technical vocabulary needed to write good prompts, and (2) students do not understand the extent of information that LLMs need to solve code generation tasks. We study (1) with a causal intervention experiment on technical vocabulary and (2) by analyzing graphs that abstract how students edit prompts and the different failures that they encounter. We find that substance beats style: a poor grasp of technical vocabulary is merely correlated with prompt failure; that the information content of prompts predicts success; that students get stuck making trivial edits; and more. Our findings have implications for the use of LLMs in programming education, and for efforts to make computing more accessible with LLMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Why do beginners encounter difficulties when using large language models (LLMs) for text - to - code tasks? Specifically, the paper explores two competing hypotheses: 1. **Hypothesis of Insufficient Technical Vocabulary**: Students lack the technical vocabulary required to write effective prompts. 2. **Hypothesis of Improper Information Selection**: Students do not understand what information LLMs need to solve code - generation tasks. To verify these two hypotheses, the author conducted the following research: - **Causal Intervention Experiment**: By replacing the vocabulary used by students with more standard technical terms, study the impact of vocabulary selection on the success rate of students' prompts. - **Prompt - Editing Analysis**: By analyzing how students edit prompts during the problem - solving process and the different failure situations they encounter, study the impact of information content on the success rate of students' prompts. ### Main Findings 1. **Minor Impact of Vocabulary Selection**: - The study found that the impact of vocabulary selection on the prompt success rate is weak. Even when more precise technical terms are replaced, the success rate of the prompt does not increase significantly. - For example, the replacement of certain non - standard terms (such as "character" and "set of characters") does reduce the success rate, but this is not a universal phenomenon. 2. **Information Content is More Critical**: - The information content in the prompt has a significant impact on the success rate. Prompts lacking key information almost always fail. - Students usually make trivial edits in the prompt instead of changing its information content, causing them to get stuck in an ineffective loop. ### Conclusion The main conclusion of the paper is: **Substance Over Form**. The difficulties that students encounter when using LLMs mainly stem from their challenges in selecting relevant information, rather than problems with the use of technical vocabulary. This finding is of great significance for the use of LLMs in programming education and efforts to make computing more popular through LLMs. ### Formula Representation Although this article does not involve complex mathematical formulas, when describing experimental results, Markdown format can be used to represent statistically significant differences: ```markdown | Concept Category | Llama 3.1 8B | Llama 70B | | --- | --- | --- | | String | ↓ | ↓ | | List | ↓ | ↓ | | Return | ↓ | ↓ | ``` Here, `↓` indicates a significant decrease in the success rate after replacement, `↑` indicates a significant increase in the success rate after replacement, and `-` indicates no significant difference. These findings suggest that in order to improve the effectiveness of students' use of LLMs, more attention should be paid to helping them understand and select the correct information, rather than just correcting their vocabulary use.

Substance Beats Style: Why Beginning Students Fail to Code with LLMs

How Beginning Programmers and Code LLMs (Mis)read Each Other

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the Literature

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models

Vernacular? I Barely Know Her: Challenges with Style Control and Stereotyping

Instruct or Interact? Exploring and Eliciting LLMs' Capability in Code Snippet Adaptation Through Prompt Engineering

Leveraging Prompts in LLMs to Overcome Imbalances in Complex Educational Text Data

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

Analyzing LLM Usage in an Advanced Computing Class in India

Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course

Students Struggle to Explain Their Own Program Code

Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice

Out of style: Misadventures with LLMs and code style transfer

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering

How Novices Use LLM-Based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment

"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students