Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations

Jasmine Latendresse,SayedHassan Khatoonabadi,Ahmad Abdellatif,Emad Shihab
2024-08-09
Abstract:Software libraries play a critical role in the functionality, efficiency, and maintainability of software systems. As developers increasingly rely on Large Language Models (LLMs) to streamline their coding processes, the effectiveness of these models in recommending appropriate libraries becomes crucial yet remains largely unexplored. In this paper, we assess the effectiveness of ChatGPT as a software librarian and identify areas for improvement. We conducted an empirical study using GPT-3.5 Turbo to generate Python code for 10,000 Stack Overflow questions. Our findings show that ChatGPT uses third-party libraries nearly 10% more often than human developers, favoring widely adopted and well-established options. However, 14.2% of the recommended libraries had restrictive copyleft licenses, which were not explicitly communicated by ChatGPT. Additionally, 6.5% of the libraries did not work out of the box, leading to potential developer confusion and wasted time. While ChatGPT can be an effective software librarian, it should be improved by providing more explicit information on maintainability metrics and licensing. We recommend that developers implement rigorous dependency management practices and double-check library licenses before integrating LLM-generated code into their projects.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the effectiveness of large - language models (LLM), especially ChatGPT, in recommending software libraries, and to identify areas for improvement in its practical applications. Specifically, researchers explored the following questions through empirical research: 1. **What are the characteristics of the software libraries recommended by ChatGPT?** - The researchers analyzed the number, type, popularity, maintenance status, and license type of third - party libraries used in the code generated by ChatGPT. The results show that ChatGPT is more inclined to use widely - adopted and stable third - party libraries, but there is also a certain proportion of libraries with license issues or that cannot be used directly. 2. **What challenges do developers encounter when using ChatGPT for library recommendation?** - The study found that some of the libraries recommended by ChatGPT are not included in the standard Python library or the PyPi repository, which may lead to import or installation failures, thus affecting development efficiency. In addition, some libraries have restrictive licenses, and ChatGPT does not clearly inform this information, which may also bring legal risks to developers. ### Specific Problems and Solutions - **Characteristics of Library Recommendation**: - **Frequency of Third - Party Library Use**: ChatGPT uses third - party libraries more frequently than human developers, about 10% more. - **Popularity and Stability of Libraries**: ChatGPT tends to select high - popularity and mature libraries, such as `requests`, `pandas`, and `numpy`, which usually have fewer dependencies and are easy to integrate. - **License Issues**: 14.2% of the recommended libraries have restrictive licenses, and 6.5% of the libraries cannot be used directly, which may cause difficulties for developers when using them. - **Challenges Faced by Developers**: - **Library Availability**: Some of the recommended libraries are not in the PyPi repository, resulting in the inability to install or import directly, increasing the development complexity. - **License Transparency**: ChatGPT does not clearly state the license types of some libraries, especially those with restrictive licenses, which may lead to legal problems. - **Library Quality and Maintenance**: Although most of the libraries recommended by ChatGPT are of high quality, there are still a few libraries with poor maintenance or version conflict problems. ### Improvement Suggestions To improve the practicality of ChatGPT as a "software librarian", the researchers suggest: - **Provide More Detailed Library Information**: including the maintenance status of the library, license type, etc., to help developers make better choices. - **Strengthen Dependency Management**: Developers should strictly manage dependencies and carefully check the library licenses before integrating the code generated by LLM. - **Improve the Library Recommendation Algorithm**: Optimize ChatGPT's recommendation mechanism to reduce the recommendation of unavailable or potentially problematic libraries. Through these improvements, LLM tools can be better utilized to simplify the development process while ensuring the safety and legality of the code.