TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

Shubhra Kanti Karmaker Santu,Dongji Feng
2023-10-25
Abstract:While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task.
Artificial Intelligence,Computation and Language,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the challenges faced by large language models (LLMs) when performing complex tasks, particularly the performance differences caused by different types of prompts and varying levels of detail in descriptions. The paper proposes a general prompt taxonomy—TELeR, designed to create prompts with specific attributes to execute a wide range of complex tasks. By establishing this common standard, future research can report the specific prompt categories used in their studies, enabling meaningful comparisons between different studies and allowing researchers to draw more accurate conclusions about LLMs' performance on specific complex tasks. Additionally, the TELeR taxonomy aims to facilitate the formation of a consensus among different research teams regarding the capabilities of LLMs in executing complex tasks by providing a unified approach.