A Novel Graph-Based Program Representation for Java Code Plagiarism Detection

Hayden Cheers,Yuqing Lin
DOI: https://doi.org/10.1145/3378936.3378960
2020-01-12
Abstract:Source code plagiarism is a long-standing issue in undergraduate computer science education. Identifying instances of source code plagiarism is a difficult and time-consuming task. To aid in its identification, many automated tools have been proposed to find indications of plagiarism. However, prior works have shown that common source code plagiarism detection tools are susceptible to plagiarism-hiding transformations. In this paper a novel graph-based representation of Java programs is presented which is resilient to plagiarism-hiding transformations. This graph is titled the Program Interaction Dependency Graph (PIDG) and represents the interaction and transformation of data within a program, and how this data interacts with the system. To show the effectiveness of this graph, it is evaluated on a data set of simulated source code plagiarism. The results of this evaluation indicate the PIDG is a promising means of representing programs in a form that is resilient to plagiarism.
What problem does this paper attempt to address?