What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to improve the accuracy of large language models (LLMs) in code readability assessment through personalized calibration**. ### Specific problem description 1. **Importance of code readability** - Code readability is an important indicator of software maintenance because it significantly affects the maintenance workload. Developers need to read and understand the target module first in order to carry out software maintenance. - Traditional code readability assessment methods include various metrics, such as cyclomatic complexity, lines of code (LOC), etc., but these methods cannot fully capture the differences in subjective evaluation of code readability among different developers. 2. **Application of LLMs in code readability assessment** - Recently, large language models (LLMs) have been used to assess the readability of code and have shown certain potential. However, due to the differences in the evaluation criteria of code readability among different developers, the assessment results of LLMs may not be applicable to all developers. 3. **Personalization requirements** - Different developers may have significant differences in the readability evaluation of the same piece of code. Code that one developer considers easy to read may not be easy to understand for another developer. Therefore, in order to improve the accuracy and applicability of LLMs' assessment, personalized calibration is required. ### Solution This research proposes a method based on collaborative filtering to calibrate LLMs' code readability assessment. The specific steps are as follows: 1. **Constructing the calibration model** - Use developers' subjective evaluations as the dependent variable, and LLMs' assessment results and code metrics (such as cyclomatic complexity, lines of code, etc.) as the independent variables to construct the calibration model. - The goal of the calibration model is to adjust LLMs' assessment results to make them more in line with the evaluation criteria of specific developers. 2. **Personalized calibration** - For developers without sufficient assessment data, use collaborative filtering technology to select calibration models of other similar developers. By calculating similarity (such as Euclidean distance or cosine similarity), find the most appropriate calibration model for personalized assessment. ### Experimental verification The researchers used Scalabrino's dataset, which includes the readability assessment of 200 code fragments completed by 9 participants. The experimental results show that the model after personalized calibration has higher accuracy in predicting code readability than directly using LLMs. In particular, when using the Euclidean distance to calculate similarity, the absolute error is the smallest, indicating the effectiveness of this method. ### Conclusion This research emphasizes the importance of personalized calibration in improving the accuracy of LLMs' code readability assessment, and proves that the calibration method based on collaborative filtering can effectively reduce the evaluation differences among different developers, thereby enhancing the accuracy and practicality of the assessment.

Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering

Assessing Consensus of Developers' Views on Code Readability

Reassessing Java Code Readability Models with a Human-Centered Approach

ReadCtrl: Personalizing text generation with readability-controlled instruction learning

Fine-tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review

AI-powered Code Review with LLMs: Early Results

Evaluating Language Models for Generating and Judging Programming Feedback

Exploring the Capabilities of LLMs for Code Change Related Tasks

Large Language Models as Partners in Student Essay Evaluation

Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

Can Large Language Models Serve as Evaluators for Code Summarization?

An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Evaluating Code Readability and Legibility: An Examination of Human-centric Studies

Automatically Recommend Code Updates: Are We There Yet?

Beyond Utility: Evaluating LLM as Recommender

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Calculating Originality of LLM Assisted Source Code

PersonalLLM: Tailoring LLMs to Individual Preferences

"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students