Abstract:AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of licensed code, and malicious consequences due to the unregulated use of these models. In this paper, we present the first-of-its-kind study to systematically investigate the accountability of PLG models from the perspectives of both model development and deployment. In particular, we develop a holistic framework not only to audit the training data usage of PLG models, but also to identify neural code generated by PLG models as well as determine its attribution to a source model. To this end, we propose using membership inference to audit whether a code snippet used is in the PLG model's training data. In addition, we propose a learning-based method to distinguish between human-written code and neural code. In neural code attribution, through both empirical and theoretical analysis, we show that it is impossible to reliably attribute the generation of one code snippet to one model. We then propose two feasible alternative methods: one is to attribute one neural code snippet to one of the candidate PLG models, and the other is to verify whether a set of neural code snippets can be attributed to a given PLG model. The proposed framework thoroughly examines the accountability of PLG models which are verified by extensive experiments. The implementations of our proposed framework are also encapsulated into a new artifact, named CodeForensic, to foster further research.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to systematically study and evaluate the liability issues in the development and deployment of AI Programming Language Generation (PLG) models. Specifically, the authors focus on the following aspects: 1. **Audit of training data use**: How to determine whether a code snippet has been used in the training data of a PLG model. 2. **Neural code detection**: How to distinguish between code written by humans and code generated by a PLG model. 3. **Neural code attribution**: How to determine which specific PLG model generated a neural code snippet. #### Specific problem descriptions - **Legal and copyright issues**: Unauthorized use of source code for training PLG models may lead to copyright infringement and damage the intellectual property rights of code creators. Moreover, PLG models may generate the same code as the licensed source code without corresponding attribution information. - **Ethical issues**: Potential misuse of PLG models may lead to the spread of false information, academic plagiarism, etc., affecting the public interest and the network environment. - **Lack of a comprehensive framework**: Most current research focuses on natural - language - generation models, with less attention paid to emerging PLG models, and existing research usually evaluates model liability from a single perspective (such as training data or model output). ### Research objectives The authors' goals are to enhance the sense of responsibility of PLG models in the following ways: - **Audit of training data use**: Propose a method based on the Likelihood Ratio Test (LRT) to determine whether a given code snippet exists in the training data of a PLG model. - **Neural code detection**: Build a learning - based classifier to distinguish between human - written code and neural code. - **Neural code attribution**: Prove that a single code snippet cannot be reliably attributed to a specific PLG model, and propose two feasible alternative methods: - Attribution classification: Attribute a neural code snippet to one of a set of candidate PLG models. - Attribution verification: Verify whether a set of neural code fragments can be attributed to a given PLG model. ### Methods and contributions - **First systematic study**: This is the first systematic study of the liability issues of PLG models from both the model development and deployment perspectives. - **Propose new methods**: For the audit of training data use, a member - inference method based on LRT is proposed; for neural code detection, a learning - based classifier is proposed; for neural code attribution, theoretical and empirical support is provided. - **Tool made public**: Develop a toolkit named CODEFORENSIC for characterizing the liability of neural code. Through these studies, the authors hope to provide a comprehensive framework for regulatory agencies and the software industry to ensure the legal, transparent, and responsible use of PLG models.

The "code'' of Ethics:A Holistic Audit of AI Code Generators

Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative Values

ChatGPT Code Detection: Techniques for Uncovering the Source of Code

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Developer Perspectives on Licensing and Copyright Issues Arising from Generative AI for Coding

Optimizing AI-Assisted Code Generation

AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection

Assessing the Performance of AI-Generated Code: A Case Study on GitHub Copilot

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?

WhyGen: Explaining ML-powered Code Generation by Referring to Training Examples

On the Reliability and Explainability of Language Models for Program Generation

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry