A Large-Scale Empirical Study on Java Library Migrations: Prevalence, Trends, and Rationales

Hao He,Runzhi He,Haiqiao Gu,Minghui Zhou
DOI: https://doi.org/10.1145/3468264.3468571
2021-01-01
Abstract:With the rise of open-source software and package hosting platforms, reusing 3rd-party libraries has become a common practice. Due to various failures during software evolution, a project may remove a used library and replace it with another library, which we call library migration. Despite substantial research on dependency management, the understanding of how and why library migrations occur is still lacking. Achieving this understanding may help practitioners optimize their library selection criteria, develop automated approaches to monitor dependencies, and provide migration suggestions for their libraries or software projects. In this paper, through a fine-grained commit-level analysis of 19,652 Java GitHub projects, we extract the largest migration dataset to-date (1,194 migration rules, 3,163 migration commits). We show that 8,065 (41.04%) projects having at least one library removal, 1,564 (7.96%, lower-bound) to 5,004 (25.46%, upper-bound) projects have at least one migration, and a median project with migrations has 2 to 4 migrations in total. We discover that library migrations are dominated by several domains (logging, JSON, testing and web service) presenting a long tail distribution. Also, migrations are highly unidirectional in that libraries are either mostly abandoned or mostly chosen in our project corpus. A thematic analysis on related commit messages, issues, and pull requests identifies 14 frequently mentioned migration reasons (e.g., lack of maintenance, usability, integration, etc), 7 of which are not discussed in previous work. Our findings can be operationalized into actionable insights for package hosting platforms, project maintainers, and library developers. We provide a replication package at https://doi.org/10.5281/zenodo.4816752.
What problem does this paper attempt to address?