Linking Code and Documentation Churn: Preliminary Analysis

Ani Hovhannisyan,Youmei Fan,Gema Rodriguez-Perez,Raula Gaikovina Kula
2024-10-08
Abstract:Code churn refers to the measure of the amount of code added, modified, or deleted in a project and is often used to assess codebase stability and maintainability. Program comprehension or how understandable the changes are, is equally important for maintainability. Documentation is crucial for knowledge transfer, especially when new maintainers take over abandoned code. We emphasize the need for corresponding documentation updates, as this reflects project health and trustworthiness as a third-party library. Therefore, we argue that every code change should prompt a documentation update (defined as documentation churn). Linking code churn changes with documentation updates is important for project sustainability, as it facilitates knowledge transfer and reduces the effort required for program comprehension. This study investigates the synchrony between code churn and documentation updates in three GitHub open-source projects. We will use qualitative analysis and repository mining to examine the alignment and correlation of code churn and documentation updates over time. We want to identify which code changes are likely synchronized with documentation and to what extent documentation can be auto-generated. Preliminary results indicate varying degrees of synchrony across projects, highlighting the importance of integrated concurrent documentation practices and providing insights into how recent technologies like AI, in the form of Large Language Models (i.e., LLMs), could be leveraged to keep code and documentation churn in sync. The novelty of this study lies in demonstrating how synchronizing code changes with documentation updates can improve the development lifecycle by enhancing diversity and efficiency.
Software Engineering
What problem does this paper attempt to address?