Comparative Analysis of Prompt Strategies for Large Language Models: Single-Task vs. Multitask Prompts
Manuel Gozzi,Federico Di Maio
DOI: https://doi.org/10.3390/electronics13234712
IF: 2.9
2024-11-30
Electronics
Abstract:This study investigates the effectiveness of prompt engineering strategies for Large Language Models (LLMs), comparing single-task and multitasking prompts. Specifically, we analyze whether a single prompt handling multiple tasks—such as named entity recognition (NER), sentiment analysis, and JSON output formatting—can achieve performance comparable to dedicated single-task prompts. To substantiate our findings, we employ statistical analyses, including paired Wilcoxon tests, McNemar tests, and Friedman tests, to validate claims of performance similarity or superiority. Experiments were conducted using five open-weight LLMs: LLama3.1 8B, Qwen2 7B, Mistral 7B, Phi3 Medium, and Gemma2 9B. The results indicate that there is no definitive rule favoring single-task prompts over multitask prompts; rather, their relative performance is highly contingent on the specific model's data and architecture. This study highlights the nuanced interplay between prompt strategies and LLM characteristics, offering insights into optimizing their use for specific NLP tasks. Limitations and future directions, such as expanding task types, are also discussed.
engineering, electrical & electronic,computer science, information systems,physics, applied