Practitioners' Discussions on Building LLM-based Applications for Production

Alina Mailach,Sebastian Simon,Johannes Dorn,Norbert Siegmund
2024-11-13
Abstract:\textit{Background}: Large language models (LLMs) have become a paramount interest of researchers and practitioners alike, yet a comprehensive overview of key considerations for those developing LLM-based systems is lacking. This study addresses this gap by collecting and mapping the topics practitioners discuss online, offering practical insights into where priorities lie in developing LLM-based applications. \textit{Method}: We collected 189 videos from 2022 to 2024 from practitioners actively developing such systems and discussing various aspects they encounter during development and deployment of LLMs in production. We analyzed the transcripts using BERTopic, then manually sorted and merged the generated topics into themes, leading to a total of 20 topics in 8 themes. \textit{Results}: The most prevalent topics fall within the theme Design \& Architecture, with a strong focus on retrieval-augmented generation (RAG) systems. Other frequently discussed topics include model capabilities and enhancement techniques (e.g., fine-tuning, prompt engineering), infrastructure and tooling, and risks and ethical challenges. \textit{Implications}: Our results highlight current discussions and challenges in deploying LLMs in production. This way, we provide a systematic overview of key aspects practitioners should be aware of when developing LLM-based applications. We further pale off topics of interest for academics where further research is needed.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of a comprehensive overview of the key considerations and challenges faced by practitioners in the development and deployment process of large - language models (LLMs) in practical applications. Specifically, although LLMs have been successfully applied in research tasks, research on how to build and deploy LLM applications in practice is still insufficient. By collecting and analyzing the topics of practitioners' online discussions, the paper provides a systematic perspective, reveals the current hot topics and challenges in the discussions, aims to provide guidance on key aspects for practitioners developing LLM applications, and points out the areas of interest for further research in academia. ### Main contributions of the paper: 1. **Topic map**: Provides a map of the topics discussed by practitioners regarding building and deploying LLM applications. 2. **Topic co - occurrence analysis**: Analyzes the topic co - occurrence situation, highlighting relevant considerations and decisions. 3. **Reproduction package**: Provides a comprehensive reproduction package, including the mapping of videos and related topics. ### Methods: - **Data collection**: Collected 189 videos from YouTube, which cover the discussions of practitioners on building and deploying LLM applications from 2022 to 2024. - **Data processing**: Used `whisper - large` for automatic transcription, and then applied a semantic text segmenter to divide the transcribed content into context - coherent paragraphs. - **Topic modeling**: Used BERTopic for topic modeling and generated 42 initial topics. - **Manual evaluation**: Manually evaluated and merged topics, and finally determined 20 specific topics under 8 topics. ### Main results: - **Architecture and design** (93 videos, 52.2%): - **RAG system** (72 videos, 40.4%): Discussed the implementation and optimization challenges of the retrieval - augmented generation (RAG) system. - **Agent system** (15 videos, 8.4%): Introduced the multi - coordinated - call agent system and the challenges it faces. - **Memory management** (16 videos, 9.0%): Explored how to effectively manage the context information in LLM applications. - **User interface** (23 videos, 12.9%): Discussed the design and evolution of the user interface in LLM applications. - **Model capabilities and techniques** (83 videos, 46.6%): - **Fine - tuning** (43 videos, 24.2%): Discussed how to adapt LLMs to specific domains through fine - tuning, as well as related data collection and quality assurance issues. - **Prompt engineering** (41 videos, 23.0%): Introduced how to adjust the behavior of LLMs through carefully designed prompts. ### Conclusion: Through systematic analysis, the paper reveals the key topics and challenges in building and deploying LLM applications, provides valuable guidance for practitioners, and points out the directions for future research.