Abstract:With the advancements of Large Language Models (LLMs), an increasing number of open-source software projects are using LLMs as their core functional component. Although research and practice on LLMs are capturing considerable interest, no dedicated studies explored the challenges faced by practitioners of LLM open-source projects, the causes of these challenges, and potential solutions. To fill this research gap, we conducted an empirical study to understand the issues that practitioners encounter when developing and using LLM open-source software, the possible causes of these issues, and potential solutions.We collected all closed issues from 15 LLM open-source projects and labelled issues that met our requirements. We then randomly selected 994 issues from the labelled issues as the sample for data extraction and analysis to understand the prevalent issues, their underlying causes, and potential solutions. Our study results show that (1) Model Issue is the most common issue faced by practitioners, (2) Model Problem, Configuration and Connection Problem, and Feature and Method Problem are identified as the most frequent causes of the issues, and (3) Optimize Model is the predominant solution to the issues. Based on the study results, we provide implications for practitioners and researchers of LLM open-source projects.

What problem does this paper attempt to address?

This paper aims to address the issues encountered in the development and use of large language models (LLMs) in open-source projects, and to explore the root causes and potential solutions to these issues. Specifically: 1. **Research Background and Objectives**: - With the development of large language models, an increasing number of open-source software projects are incorporating LLMs as their core functional components. Despite significant progress in the research and practice of LLMs, there is currently a lack of research specifically focused on the challenges, causes, and solutions faced in LLMs open-source projects. - To fill this research gap, the authors conducted an empirical study to understand the problems developers encounter when developing and using LLMs open-source software, identify the root causes of these problems, and propose possible solutions. 2. **Main Findings**: - The study results indicate that **model issues** are the most common type of problems, including runtime issues, architecture design issues, loading issues, training issues, etc. - The main causes of these problems include model issues, configuration and connection issues, and feature and method issues. - The most commonly used solution is to optimize the model. 3. **Research Contributions**: - By collecting and analyzing nearly 1000 closed issues from 15 LLMs open-source projects on GitHub, the authors provided a two-level classification system for these issues and categorized the causes and solutions of these issues. - The study also provided mappings between identified issues and their causes, as well as between issues and their solutions. 4. **Research Methods**: - Data Collection: 15 LLMs open-source projects meeting specific criteria were selected from GitHub, and all closed issues were collected. - Data Annotation and Sampling: The collected issues were annotated, and 994 samples were randomly selected for detailed analysis. - Data Extraction: A series of data items were defined to extract relevant information about issues, causes, and solutions. - Data Analysis: Qualitative data analysis was conducted using open coding and constant comparison methods. 5. **Research Results**: - It was found that model issues are the most frequently encountered problem category by developers, followed by component issues and parameter issues. - By categorizing the types of issues, the study revealed the most common problems and their causes in LLMs open-source projects and proposed corresponding solutions. Through this study, the authors hope to provide valuable insights for practitioners and researchers of LLMs open-source projects, helping them better understand and solve the problems encountered in actual development.

Demystifying Issues, Causes and Solutions in LLM Open-Source Projects

A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges

An Empirical Study on Challenges for LLM Application Developers

Studying LLM Performance on Closed- and Open-source Data

Developer Challenges on Large Language Models: A Study of Stack Overflow and OpenAI Developer Forum Posts

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Demystifying Issues, Challenges, and Solutions for Multilingual Software Development (Artifact)

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project

"The teachers are confused as well": A Multiple-Stakeholder Ethics Discussion on Large Language Models in Computing Education

LLM360: Towards Fully Transparent Open-Source LLMs

An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support

InternLM-Law: An Open Source Chinese Legal Large Language Model

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey

MarkLLM: An Open-Source Toolkit for LLM Watermarking

Navigating LLM Ethics: Advancements, Challenges, and Future Directions

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products

Large language models for automated Q&A involving legal documents: a survey on algorithms, frameworks and applications

I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion