Abstract:Artificial Intelligence (AI) based code completion tools such as Github Copilot have recently gained tremendous popularity due to their ability to suggest arbitrary length snippets, improving developer productivity dramatically. However, there is little public understanding of what it takes to build such a tool. In this thesis, we explore the design space of building such a tool. We study the importance of the two key components of such a tool: the Large Language Model (LLM) that predicts the suggestions, and the system around it that feeds it the right context and filters out the bad suggestions. We start by exploring the design of Github Copilot to understand the state of the art, and describe the three key components of Copilot: Prompt Engineering, Model Invocation and Feedback loop. We then study the various factors that affect the quality of the suggestions generated by the LLM. We study both (a) the impact of the context fed to the LLM, and (b) the impact of the LLM itself. For the former, we study the impact including context from other files and code after the cursor along with different methods of context collection and amount of collected context. For the latter, we study the impact of the size of the LLM and the training procedure. Apart from factors affecting the quality of suggestions, we also study the factors affecting the latency of such code completion engines, as low latency is critical for building good code completion engines. We find that the context fed to the model makes a significant difference in the quality of generated suggestions, and good context collection can improve the quality of suggestions by 11-26% points (20-113% relative improvement) on the exact match metric for one line suggestions. Models that can exploit the context after the cursor can further improve the quality of suggestions by 6-14% points (12-16% relative improvement). Our experiments show that increasing the prompt length beyond a point does not improve suggestion quality significantly, and that 2048-4096 tokens are sufficient. We also find that the size of the LLM has a much smaller impact on the quality of suggestions than other factors such as the context fed to the model and the training procedure used. For example, we found that the SantaCoder model (1.1B parameters) provided better suggestions than the 16B CodeGen-Multi

Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot

An Empirical Evaluation of GitHub Copilot's Code Suggestions

Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study on GitHub and Stack Overflow

Practices and Challenges of Using GitHub Copilot: An Empirical Study

Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot

On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

GitHub Copilot: the perfect Code compLeeter?

Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

EXPLORING THE DESIGN SPACE OF AI BASED CODE COMPLETION ENGINES

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming

POTENCIALES Y DESAFÍOS DE GITHUB COPILOT COMO HERRAMIENTA DE INTELIGENCIA ARTIFICIAL

The Productivity Effects of Generative AI: Evidence from a Field Experiment with GitHub Copilot

"It's Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers

The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot

Measuring the Runtime Performance of C++ Code Written by Humans using GitHub Copilot

Grounded Copilot: How Programmers Interact with Code-Generating Models

Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot

Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?

The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot