Plagiarism Detection in Computer Programming Using Feature Extraction From Ultra-Fine-Grained Repositories

Vedran Ljubovic,Enil Pajic
DOI: https://doi.org/10.1109/access.2020.2996146
IF: 3.9
2020-01-01
IEEE Access
Abstract:Detecting instances of plagiarism in student homework, especially programming homework, is an important issue for practitioners. In the past decades, several tools have emerged that are able to effectively compare large corpora of homeworks and sort pairs by degree of similarity. However, those tools are available to students as well, allowing them to experiment and develop elaborate methods for evading detection. Also, such tools are unable to detect instances of "external plagiarism" where students obtained unethical help from sources not among other students of the same course. One way to battle this problem is to monitor student activity while solving their homeworks using a cloud-based integrated development environment (IDE) and detect suspicious behaviours. Each editing event in program source can be stored as a new commit to create a form of ultra-fine-grained source code repository. In this paper, the authors propose several new features that can be extracted from such repositories with the purpose of building a comprehensive profile of each individual developer. Machine learning techniques were used to detect suspicious behaviours, which allowed the authors to significantly improve upon the performance of more traditional plagiarism detection tools.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?