Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering

Buntaro Hiraki,Kensei Hamamoto,Ami Kimura,Masateru Tsunoda,Amjed Tahir,Kwabena Ebo Bennin,Akito Monden,Keitaro Nakasai
2024-11-16
Abstract:Code readability is an important indicator of software maintenance as it can significantly impact maintenance efforts. Recently, LLM (large language models) have been utilized for code readability evaluation. However, readability evaluation differs among developers, so personalization of the evaluation by LLM is needed. This study proposes a method which calibrates the evaluation, using collaborative filtering. Our preliminary analysis suggested that the method effectively enhances the accuracy of the readability evaluation using LLMs.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the accuracy of large language models (LLMs) in code readability assessment through personalized calibration**. ### Specific problem description 1. **Importance of code readability** - Code readability is an important indicator of software maintenance because it significantly affects the maintenance workload. Developers need to read and understand the target module first in order to carry out software maintenance. - Traditional code readability assessment methods include various metrics, such as cyclomatic complexity, lines of code (LOC), etc., but these methods cannot fully capture the differences in subjective evaluation of code readability among different developers. 2. **Application of LLMs in code readability assessment** - Recently, large language models (LLMs) have been used to assess the readability of code and have shown certain potential. However, due to the differences in the evaluation criteria of code readability among different developers, the assessment results of LLMs may not be applicable to all developers. 3. **Personalization requirements** - Different developers may have significant differences in the readability evaluation of the same piece of code. Code that one developer considers easy to read may not be easy to understand for another developer. Therefore, in order to improve the accuracy and applicability of LLMs' assessment, personalized calibration is required. ### Solution This research proposes a method based on collaborative filtering to calibrate LLMs' code readability assessment. The specific steps are as follows: 1. **Constructing the calibration model** - Use developers' subjective evaluations as the dependent variable, and LLMs' assessment results and code metrics (such as cyclomatic complexity, lines of code, etc.) as the independent variables to construct the calibration model. - The goal of the calibration model is to adjust LLMs' assessment results to make them more in line with the evaluation criteria of specific developers. 2. **Personalized calibration** - For developers without sufficient assessment data, use collaborative filtering technology to select calibration models of other similar developers. By calculating similarity (such as Euclidean distance or cosine similarity), find the most appropriate calibration model for personalized assessment. ### Experimental verification The researchers used Scalabrino's dataset, which includes the readability assessment of 200 code fragments completed by 9 participants. The experimental results show that the model after personalized calibration has higher accuracy in predicting code readability than directly using LLMs. In particular, when using the Euclidean distance to calculate similarity, the absolute error is the smallest, indicating the effectiveness of this method. ### Conclusion This research emphasizes the importance of personalized calibration in improving the accuracy of LLMs' code readability assessment, and proves that the calibration method based on collaborative filtering can effectively reduce the evaluation differences among different developers, thereby enhancing the accuracy and practicality of the assessment.