An Empirical Study to Evaluate Structural Similarity for Source Code Translation

Xulu Yao,Moi Hoon Yap,Yanlong Zhang
DOI: https://doi.org/10.1109/times-icon47539.2019.9024512
2019-12-01
Abstract:Statistical Machine Translation (SMT) is a research hotspot in machine translation and natural language processing. Recently, source code translation tasks based on SMT model have been applied to Software Engineering. Unfortunately, there is no automated metric that can effectively detect the accuracy of code translation. Considering the similarity between code similarity detection and machine translation scoring process, this paper proposes Code Semantic Metric (CSM) based on traditional code plagiarism detection metrics to verify its applicability to code translation tasks. Our empirical research shows that the results of different methods of code plagiarism detection are quite different. After specific parameter adjustment, CSM can reflect the correctness of translation code semantics to a certain extent. We confirm that CSM has a high correlation with human judgment in the semantic accuracy of translated code, which surpasses the scores of MOSS and JPlag, the mainstream traditional code plagiarism detection methods.
What problem does this paper attempt to address?