The "code'' of Ethics:A Holistic Audit of AI Code Generators

Wanlun Ma,Yiliao Song,Minhui Xue,Sheng Wen,Yang Xiang
2023-05-22
Abstract:AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of licensed code, and malicious consequences due to the unregulated use of these models. In this paper, we present the first-of-its-kind study to systematically investigate the accountability of PLG models from the perspectives of both model development and deployment. In particular, we develop a holistic framework not only to audit the training data usage of PLG models, but also to identify neural code generated by PLG models as well as determine its attribution to a source model. To this end, we propose using membership inference to audit whether a code snippet used is in the PLG model's training data. In addition, we propose a learning-based method to distinguish between human-written code and neural code. In neural code attribution, through both empirical and theoretical analysis, we show that it is impossible to reliably attribute the generation of one code snippet to one model. We then propose two feasible alternative methods: one is to attribute one neural code snippet to one of the candidate PLG models, and the other is to verify whether a set of neural code snippets can be attributed to a given PLG model. The proposed framework thoroughly examines the accountability of PLG models which are verified by extensive experiments. The implementations of our proposed framework are also encapsulated into a new artifact, named CodeForensic, to foster further research.
Cryptography and Security
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to systematically study and evaluate the liability issues in the development and deployment of AI Programming Language Generation (PLG) models. Specifically, the authors focus on the following aspects: 1. **Audit of training data use**: How to determine whether a code snippet has been used in the training data of a PLG model. 2. **Neural code detection**: How to distinguish between code written by humans and code generated by a PLG model. 3. **Neural code attribution**: How to determine which specific PLG model generated a neural code snippet. #### Specific problem descriptions - **Legal and copyright issues**: Unauthorized use of source code for training PLG models may lead to copyright infringement and damage the intellectual property rights of code creators. Moreover, PLG models may generate the same code as the licensed source code without corresponding attribution information. - **Ethical issues**: Potential misuse of PLG models may lead to the spread of false information, academic plagiarism, etc., affecting the public interest and the network environment. - **Lack of a comprehensive framework**: Most current research focuses on natural - language - generation models, with less attention paid to emerging PLG models, and existing research usually evaluates model liability from a single perspective (such as training data or model output). ### Research objectives The authors' goals are to enhance the sense of responsibility of PLG models in the following ways: - **Audit of training data use**: Propose a method based on the Likelihood Ratio Test (LRT) to determine whether a given code snippet exists in the training data of a PLG model. - **Neural code detection**: Build a learning - based classifier to distinguish between human - written code and neural code. - **Neural code attribution**: Prove that a single code snippet cannot be reliably attributed to a specific PLG model, and propose two feasible alternative methods: - Attribution classification: Attribute a neural code snippet to one of a set of candidate PLG models. - Attribution verification: Verify whether a set of neural code fragments can be attributed to a given PLG model. ### Methods and contributions - **First systematic study**: This is the first systematic study of the liability issues of PLG models from both the model development and deployment perspectives. - **Propose new methods**: For the audit of training data use, a member - inference method based on LRT is proposed; for neural code detection, a learning - based classifier is proposed; for neural code attribution, theoretical and empirical support is provided. - **Tool made public**: Develop a toolkit named CODEFORENSIC for characterizing the liability of neural code. Through these studies, the authors hope to provide a comprehensive framework for regulatory agencies and the software industry to ensure the legal, transparent, and responsible use of PLG models.