Copy Detection among Programs Using Extreme Learning Machines

bin wang,xiaochun yang,guoren wang
DOI: https://doi.org/10.1007/978-3-319-14066-7_19
2015-01-01
Abstract:Because of the complexity of software development, some software developers may plagiarize source code from other projects or open source software in order to shorten development cycle. Many methods have been proposed to detect plagiarism among programs based on the program dependence graph, a graph representation of a program. However, the accuracy and efficiency of the detection approaches need to be improved. By employing extreme learning machine (ELM), we construct feature space for describing features of every two programs with possible plagiarism relationship. Such feature space could be large and time consuming, so we propose approaches to construct a small feature space by pruning isolated control statements and removable statements from each program to accelerate both training and classification time. We conducted a thorough experimental study of this technique on real C programs collected from Internet. The experimental results show the high accuracy and efficiency of our ELM-based approach.
What problem does this paper attempt to address?