Abstract:Code clone detection is commonly approached as a binary classification task, determining whether code pairs are clones or not based on a fixed threshold. However, code clones exhibit varying degrees of similarity, leading to different types of clones. To explore the impact of detection manners on clone detection results, we proposed a Gated Recurrent Residual Learning Networks for code clone detection task. The experimental results demonstrate that different detection manners yield varying results, even with the same model and dataset. Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods.

Learning to Detect Table Clones in Spreadsheets.

Detecting table clones and smells in spreadsheets.

Code Clone Detection: A Literature Review

SimClone: Detecting Tabular Data Clones using Value Similarity

Semantic Table Structure Identification in Spreadsheets.

Detecting Differences Across Multiple Instances of Code Clones

TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

TableLab: An Interactive Table Extraction System with Adaptive Deep Learning

A Machine Learning Based Framework for Code Clone Validation

Table Structure Recognition using Top-Down and Bottom-Up Cues

End-to-End Compound Table Understanding with Multi-Modal Modeling

Rethinking Table Structure Recognition Using Sequence Labeling Methods

WARDER: Refining Cell Clustering for Effective Spreadsheet Defect Detection via Validity Properties

An ensemble learning approach for software semantic clone detection

ClusterTabNet: Supervised clustering method for table detection and table structure recognition

WARDER: Towards Effective Spreadsheet Defect Detection by Validity-Based Cell Cluster Refinements

SEMv2: Table separation line detection based on instance segmentation

GRRLN: Gated Recurrent Residual Learning Networks for code clone detection

A Scalable and Accurate Approach Based on Count Matrix for Detecting Code Clones