The Relationship Between Folder Use and the Number of Forks : A Case Study on Github Repositories

Jiaxin Zhu,Minghui Zhou
2014-01-01
Abstract:Every software development project uses folders to organize software artifacts. We would like to understand how folders are used and what ramifications different uses may have. In this paper we study the frequency of folders used by 140k Github projects and use regression analysis to model how folder use is related to the extent of forking. We find that the standard folders, such as document, test, and example, are not only among the most frequently used folders, but their presence in a project increases the chances that a project code will be forked (i.e., used by others), and increases the number of such forks. This preliminary study of folder use suggests the opportunities to quantify (and improve) file organization practices in large collections of repositories.
What problem does this paper attempt to address?