Abstract:Large Language Models (LLMs) have become widely adopted recently. Research explores their use both as autonomous agents and as tools for software engineering. LLM-integrated applications, on the other hand, are software systems that leverage an LLM to perform tasks that would otherwise be impossible or require significant coding effort. While LLM-integrated application engineering is emerging as new discipline, its terminology, concepts and methods need to be established. This study provides a taxonomy for LLM-integrated applications, offering a framework for analyzing and describing these systems. It also demonstrates various ways to utilize LLMs in applications, as well as options for implementing such integrations. Following established methods, we analyze a sample of recent LLM-integrated applications to identify relevant dimensions. We evaluate the taxonomy by applying it to additional cases. This review shows that applications integrate LLMs in numerous ways for various purposes. Frequently, they comprise multiple LLM integrations, which we term ``LLM components''. To gain a clear understanding of an application's architecture, we examine each LLM component separately. We identify thirteen dimensions along which to characterize an LLM component, including the LLM skills leveraged, the format of the output, and more. LLM-integrated applications are described as combinations of their LLM components. We suggest a concise representation using feature vectors for visualization. The taxonomy is effective for describing LLM-integrated applications. It can contribute to theory building in the nascent field of LLM-integrated application engineering and aid in developing such systems. Researchers and practitioners explore numerous creative ways to leverage LLMs in applications. Though challenges persist, integrating LLMs may revolutionize the way software systems are built.

Towards standarized benchmarks of LLMs in software modeling tasks: a conceptual framework

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

Easy Problems That LLMs Get Wrong

LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks

A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models

A Framework for Evaluating LLMs Under Task Indeterminacy

Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products

Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications

Towards a Benchmark for Large Language Models for Business Process Management Tasks

Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent

Towards Evaluation Guidelines for Empirical Studies involving LLMs

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

Post Turing: Mapping the landscape of LLM Evaluation

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Breaking the Silence: the Threats of Using LLMs in Software Engineering