A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges

Jiaqi Wu,Lingfeng Bao,Xiaohu Yang,Xin Xia,Xing Hu
DOI: https://doi.org/10.1145/3643991.3644900
2024-01-01
Abstract:The popularity of open source software (OSS) has led to a significant increase in the number of available licenses, each with their own set of terms and conditions. This proliferation of licenses has made it increasingly challenging for developers to select an appropriate license for their projects and to ensure that they are complying with the terms of those licenses. As a result, there is a need for empirical studies to identify current practices and challenges in license usage, both to help developers make informed decisions about license selection and to ensure that OSS is being used and distributed in a legal and ethical manner. Moreover, the development of new licenses might be required to better meet the needs of the open source community and address emerging legal issues.In this paper, we conduct a large-scale empirical study of license usage across five package management platforms, i.e., Maven, NPM, PyPI, RubyGems, and Cargo. Our objective is to examine the current trends and potential issues in license usage of the OSS community. In total, we analyze the licenses of 33,710,877 packages across the selected five platforms. We statistically analyze licenses in package management platforms from multiple perspectives, e.g., license usage, license incompatibility, license updates, and license evolution. Moreover, we conduct a comparative study of various aspects of core packages and common packages in these platforms. Our results reveal irregularities in license names and license incompatibilities that require attention. We observe both similarities and differences in license usage across the five platforms, with Cargo being the most standardized among them. Finally, we discuss some implications for actions based on our findings.
What problem does this paper attempt to address?