Abstract:Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large language models (LLMs) are revolutionizing the process of writing code. Can they do the same for helping understand it? In this study, we provide a first investigation of an LLM-based conversational UI built directly in the IDE that is geared towards code understanding. Our IDE plugin queries OpenAI's GPT-3.5-turbo model with four high-level requests without the user having to write explicit prompts: to explain a highlighted section of code, provide details of API calls used in the code, explain key domain-specific terms, and provide usage examples for an API. The plugin also allows for open-ended prompts, which are automatically contextualized to the LLM with the program being edited. We evaluate this system in a user study with 32 participants, which confirms that using our plugin can aid task completion more than web search. We additionally provide a thorough analysis of the ways developers use, and perceive the usefulness of, our system, among others finding that the usage and benefits differ between students and professionals. We conclude that in-IDE prompt-less interaction with LLMs is a promising future direction for tool builders.

What problem does this paper attempt to address?

The paper primarily explores how to leverage large language models (LLM) to help developers better understand and handle code issues. The research team developed a prototype tool named GILT (Generation-based Information-support with LLM Technology), which can be directly integrated into integrated development environments (IDE). By interacting with OpenAI's GPT-3.5-turbo model, it provides users with four main functions: 1. **Explain selected code snippets**: Provide brief explanations for highlighted code sections. 2. **Provide API call details**: Offer detailed explanations of API calls used in the code. 3. **Explain key domain concepts**: Explain concepts specific to the domain required to understand the code. 4. **Provide API usage examples**: Give examples of how to use the relevant APIs. Additionally, GILT supports an open-ended question mode, allowing users to ask custom questions that are automatically associated with the current program context being edited. To evaluate the effectiveness of GILT, the researchers conducted a user study involving 32 participants. Participants were asked to complete tasks involving unfamiliar code, specifically in the domains of data visualization and 3D rendering. The study quantitatively analyzed task completion using GILT versus web searches, including metrics such as task completion time, task completion rate, and code comprehension level. The results indicated that using GILT significantly improved task completion rates, but did not show a notable improvement in time and comprehension levels. Furthermore, the study found that students and professional developers benefited differently from GILT. Students might rely more on the direct information support provided by the tool, while professional developers might prefer to leverage their own experience and knowledge to solve problems. Overall, the research suggests that directly integrating LLMs into IDEs to enable interaction without explicit prompt writing is a promising direction, helping to enhance developers' productivity and code comprehension.

Using an LLM to Help With Code Understanding

What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

IntelliExplain: Enhancing Conversational Code Generation for Non-Professional Programmers

Tool-Augmented LLMs as a Universal Interface for IDEs

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills

How Beginning Programmers and Code LLMs (Mis)read Each Other

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the Literature

Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization