Smartpip: A Smart Approach to Resolving Python Dependency Conflict Issues.

Chao Wang,Rongxin Wu,Haohao Song,Jiwu Shu,Guoqing Li
DOI: https://doi.org/10.1145/3551349.3560437
2022-01-01
Abstract:As one of the representative software ecosystems, PyPI, together with the Python package management tool pip, greatly facilitates Python developers to automatically manage the reuse of third-party libraries, thus saving development time and cost. Despite its great success in practice, a recent empirical study revealed the risks of dependency conflict (DC) issues and then summarized the characteristics of DC issues. However, the dependency resolving strategy, which is the foundation of the prior study, has evolved to a new one, namely the backtracking strategy. To understand how the evolution of this dependency resolving strategy affects the prior findings, we conducted an empirical study to revisit the characteristics of DC issues under the new strategy. Our study revealed that, of the two previously discovered DC issue manifestation patterns, one has significantly changed (Pattern A), while the other remained the same (Pattern B). We also observed, the resolving strategy for the DC issues of Pattern A suffers from the efficiency issue, while the one for the DC issues of Pattern B would lead to a waste of time and space. Based on our findings, we propose a tool smartPip to overcome the limitations of the resolving strategies. To resolve the DC issues of Pattern A, instead of iteratively verifying each candidate dependency library, we leverage a pre-built knowledge base of library dependencies to collect version constraints for concerned libraries, and then convert the version constraints into the SMT expressions for solving. To resolve the DC issues of Pattern B, we improve the existing virtual environment solution to reuse the local libraries as far as possible. Finally, we evaluated smartPip in three benchmark datasets of open source projects. The results showed that, smartPip can outperform the existing Python package management tools including pip with the new strategy and Conda in resolving DC issues of Pattern A, and achieve 1.19X - 1.60X speedups over the best baseline approach. Compared with the built-in Python virtual environment (venv), smartPip reduced 34.55% - 80.26% of storage space and achieved up to 2.26X - 6.53X speedups in resolving the DC issues of Pattern B.
What problem does this paper attempt to address?