Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

Duc Anh Le,Anh M. T. Bui,Phuong T. Nguyen,Davide Di Ruscio

2024-06-22

Abstract:Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions. Yet, getting high-quality titles is still a challenging task, attributed to both the quality of the input data (e.g., containing noise and ambiguity) and inherent constraints in sequence generation models. In this paper, we present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking. Our study focuses on enhancing pre-trained language models for generating titles for Stack Overflow posts, employing a training and subsequent fine-tuning paradigm for these models. To this end, we integrate the model's predictions into the training process, enabling it to learn from its errors, thereby lessening the effects of exposure bias. Moreover, we apply a post-ranking method to produce a variety of sample candidates, subsequently selecting the most suitable one. To evaluate FILLER, we perform experiments using benchmark datasets, and the empirical findings indicate that our model provides high-quality recommendations. Moreover, it significantly outperforms all the baselines, including Code2Que, SOTitle, CCBERT, M3NSCT5, and GPT3.5-turbo. A user study also shows that FILLER provides more relevant titles, with respect to SOTitle and GPT3.5-turbo.

Software Engineering

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality Stack Overflow (SO) post titles. Specifically, the author points out two main challenges that current methods face when generating SO post titles: 1. **Exposure Bias**: The model works differently during the training phase and the inference phase. During training, the model relies on correct preceding information to predict the next word; while during inference, the model depends on its own prediction results. This difference may lead to the amplification of early prediction errors, thus affecting the consistency and quality of the generated text. 2. **Inherent Randomness in Sequence Generation Models**: Due to the randomness of the generation model, the quality of the generated titles may vary greatly. For example, some of the generated candidate titles may be very relevant, while other candidate titles may be less relevant. To solve these problems, the author proposes a new method named FILLER, which aims to improve the generation quality of SO post titles through the following three main steps: 1. **Fine - tuning Pre - trained Language Models**: Use multi - modal input (i.e., problem description and code snippets) to fine - tune the pre - trained model to adapt to the characteristics of SO posts. 2. **Self - Improvement**: By integrating the model's own prediction results into the training dataset, reduce the difference between training and inference, thereby improving the model's robustness and accuracy. 3. **Post Ranking**: Generate multiple candidate titles during the inference phase, and select the most appropriate title through algorithms such as TextRank to ensure that the generated title is both relevant and high - quality. Through these methods, FILLER aims to overcome the deficiencies in existing methods and provide a higher - quality solution for generating SO post titles.

Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

Diverse title generation for Stack Overflow posts with multiple sampling enhanced transformer

Generating Question Titles for Stack Overflow from Mined Code Snippets

Automatic bi-modal question title generation for Stack Overflow with prompt learning

Tag recommendation in software information sites

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Representation Learning for Stack Overflow Posts: How Far are We?

Code2Que: a tool for improving question titles from mined code snippets in stack overflow

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Automated Question Title Reformulation by Mining Modification Logs From Stack Overflow

Better Language Models of Code through Self-Improvement

I Know What You Are Searching for: Code Snippet Recommendation from Stack Overflow Posts

An Intelligent Video Tag Recommendation Method for Improving Video Popularity in Mobile Computing Environment

Functional Overlap Reranking for Neural Code Generation

"Medium" LMs of Code in the Era of LLMs: Lessons From StackOverflow

Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates

Can pre-trained language models generate titles for research papers?

A Three-Phases SFT Hybrid Model Integrated Strong Prior Module and Data Overlap Estimation in the Eduation Context

PICASO: Enhancing API Recommendations with Relevant Stack Overflow Posts