Abstract:For any software system, concise and meaningful method names are critical for program comprehension and maintenance. However, for various reasons, the method names might be inconsistent with their corresponding implementations. Such inconsistent method names are confusing and misleading, often resulting in incorrect method invocations. To this end, a few intelligent deep learning-based approaches based on neural networks have been proposed to identify such inconsistent method names in the industry. Existing evaluations suggest that the performance of such DL-based approaches is promising. However, the evaluations are conducted with a perfectly balanced dataset where the number of inconsistent method names is exactly equivalent to that of consistent ones. In addition, the construction method of this balanced dataset is flawed, leading to false positives in this dataset. Consequently, the reported performance may not represent their efficiency in the field where most method names are consistent with their corresponding method bodies and only a small part of method names are inconsistent with corresponding method bodies. To this end, in this paper, we conduct an empirical study to assess the state-of-the-art DL-based approaches in the automated identification of inconsistent method names. We first build a new benchmark (dataset) by using both automatic identification from commit history and manual inspection by developers, aiming to reduce the number of false positives. Based on the benchmark, we evaluate five representative DL-based approaches to identifying inconsistent method names (one is retrieval-based and two are generation-based). Our evaluation results suggest that the performance of the evaluated approaches is substantially reduced when we switch from the existing balanced dataset to our new benchmark. Furthermore, to reveal where and why the evaluated approaches work/fail, we conduct quantitative and qualitative analyses of the evaluation results. Our analysis results suggest that the evaluated approaches work well on methods with simple bodies and short names, and retrieval-based approaches are especially good at methods whose names start with popular first sub-tokens. Retrieval-based approaches fail frequently because the adopted method representation technique is not efficient enough. Another possible reason for the failures is their unverified rationale, i.e., two methods with similar bodies should have similar names. Generation-based approaches frequently fail because of inaccurate similarity calculation formulas and immature method name generation techniques. Through the data analysis, we also propose two possible ways for better identifying inconsistent method names by leveraging contrastive learning and LLMs. Overall, our empirical study suggests that the state-of-the-art DL-based approaches in inconsistent method name identification deserve significant improvement before applying them to practical software systems.

Deep Learning Based Identification of Inconsistent Method Names: How Far Are We?

NameChecker: Detecting Inconsistency Between Method Names and Method Bodies

How are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective

Machine Learning Based Recommendation of Method Names: How Far Are We

Properly and Automatically Naming Java Methods: A Machine Learning Based Approach

Just-In-Time Method Name Updating with Heuristics and Neural Model

An intelligent java method name recommendation framework via two-phase neural networks

A Naming Pattern Based Approach for Method Name Recommendation

Deep Learning Based Identification of Suspicious Return Statements

Learning to Name Faces

Lightweight global and local contexts guided method name recommendation with prior knowledge

Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

On the Reproducibility and Replicability of Deep Learning in Software Engineering

An Exploratory Study on Automatic Identification of Assumptions in the Development of Deep Learning Frameworks

Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones Via Deep Learning

DeepAnna: Deep Learning Based Java Annotation Recommendation and Misuse Detection

An extensive empirical study of inconsistent labels in multi-version-project defect data sets

Inconsistent defect labels: essence, causes, and influence

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Pre-Implementation Method Name Prediction for Object-Oriented Programming

Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach.