Abstract:High-quality and appropriate commit messages help developers to quickly understand and track code evolution, which is crucial for the collaborative development and maintenance of software. To relieve developers of the burden of writing commit messages, researchers have proposed various techniques to generate commit messages automatically, among which learning-based techniques have proven to be promising. However, the performance of these learning-based techniques is generally low on the BLEU metric. Some reasons for low BLEU have been summarized, including the effect of noisy data, the truncation mechanism of the model, insufficient utilization of context information, etc. Through extensive empirical analysis, we find that the diversity of commits may also be one of the factors that affect the performance of existing learning-based techniques. As a result of this diversity, there are mainly two types of commit messages in the real world: one offers a superficial summary of relatively simple code changes (called the "explicit" commit message), and the other summarizes complex code changes from a global perspective, reflecting the nature or intent behind the changes (called the "implicit" commit message). Our empirical study shows that generating implicit commit messages is more challenging for these techniques, and the models have limited ability to generalize when facing cross-category generation. To fully verify these conclusions, we build a model that identifies explicit and implicit commit messages automatically, and then use it to construct our datasets. Next, we evaluate the ability of state-of-the-art learning-based techniques to generate explicit and implicit commit messages and the generalization capacity of the models. Finally, we propose a "Diversion" strategy to take advantage of the generating performance of specific models. Experimental results show that our approach improves the performance of most learning-based techniques in generating commit messages.

An Empirical Study on Learning-based Techniques for Explicit and Implicit Commit Messages Generation

Revisiting Learning-based Commit Message Generation.

Neural-machine-translation-based Commit Message Generation: How Far Are We?

On the Evaluation of Commit Message Generation Models: An Experimental Study

A large-scale empirical study of commit message generation: models, datasets and evaluation

Learning Probabilistic Models for Model Checking: an Evolutionary Approach and an Empirical Study

Using Large Language Models for Commit Message Generation: A Preliminary Study

Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond

Commit Message Generation Via ChatGPT: How Far Are We?

RAG-Enhanced Commit Message Generation

Is It Hard to Generate Holistic Commit Message?

What Makes a Good Commit Message?

Commit Messages in the Age of Large Language Models

ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking.

Automatically Generating Commit Messages from Diffs using Neural Machine Translation

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

Context-aware Retrieval-based Deep Commit Message Generation

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

Commit Message Generation for Source Code Changes.

RACE: Retrieval-Augmented Commit Message Generation