Knowledge Islands: Visualizing Developers Knowledge Concentration

Otávio Cury,Guilherme Avelino
2024-08-16
Abstract:Current software development is often a cooperative activity, where different situations can arise that put the existence of a project at risk. One common and extensively studied issue in the software engineering literature is the concentration of a significant portion of knowledge about the source code in a few developers on a team. In this scenario, the departure of one of these key developers could make it impossible to continue the project. This work presents Knowledge Islands, a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. Key features of Knowledge Islands include user authentication, cloning, and asynchronous analysis of user repositories, identification of the expertise of the team's developers, calculation of the Truck Factor for all folders and source code files, and identification of the main developers and repository files. This open-source tool enables practitioners to analyze GitHub projects, determine where knowledge is concentrated within the development team, and implement measures to maintain project health. The source code of Knowledge Islands is available in a public repository, and there is a presentation about the tool in video.
Software Engineering
What problem does this paper attempt to address?
This paper attempts to solve the problem of knowledge concentration in software development, especially in the situation where a small number of developers in a team master most of the key information in the code base. This excessive concentration of knowledge brings significant risks: if these key developers leave the project, it may lead to project stagnation or even failure. Specifically, this research aims to: 1. **Identify knowledge concentration**: By analyzing GitHub projects, determine which developers have mastered specific parts of the project and which files or modules have highly concentrated knowledge in the hands of a few people. 2. **Calculate the Truck Factor**: Use advanced algorithms (such as Avelino's Truck Factor algorithm) to evaluate the Truck Factor at each level in the project (including repositories, modules, and files). The Truck Factor is defined as the minimum number of key developers in the project. Once they leave, the project cannot continue. 3. **Provide a visualization tool**: Developed a tool named Knowledge Islands. This tool can not only calculate the above - mentioned indicators, but also help project managers intuitively understand the distribution of knowledge in the team through a visual interface, so as to take measures to disperse knowledge and reduce risks. ### Formula display The Degree of Expertise (DOE) model mentioned in the paper is used to quantify the knowledge level of developers for specific files. The formula is as follows: \[ \text{DOE}(d, f(v)) = 5.28223+0.23173\cdot\ln(1 + \text{Adds}_{d,f}(v))+0.36151\cdot(\text{FA}_f)-0.19421\cdot\ln(1 + \text{NumDays}_{d,f}(v))-0.28761\cdot\ln(\text{Size}_f(v)) \] where: - \(\text{Adds}_{d,f}(v)\): The number of lines of code added by developer \(d\) to file \(f\) in version \(v\); - \(\text{FA}_f\): If developer \(d\) is the creator of file \(f\), it is 1, otherwise it is 0; - \(\text{NumDays}_{d,f}(v)\): The number of days since developer \(d\) last committed to file \(f\); - \(\text{Size}_f(v)\): The number of lines of code in file \(f\) in version \(v\). ### Summary Through the above methods, the Knowledge Islands tool can help project managers better understand the knowledge distribution within the team, discover potential risk points in a timely manner, and take corresponding measures to ensure the continuous and healthy development of the project.