TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering

Bingru Li,Han Wang
2024-12-13
Abstract:The capacity of LLMs to carry out automated qualitative analysis has been questioned by corpus linguists, and it has been argued that corpus-based discourse analysis incorporating LLMs is hindered by issues of unsatisfying performance, hallucination, and irreproducibility. Our proposed method, TACOMORE, aims to address these concerns by serving as an effective prompting framework in this domain. The framework consists of four principles, i.e., Task, Context, Model and Reproducibility, and specifies five fundamental elements of a good prompt, i.e., Role Description, Task Definition, Task Procedures, Contextual Information and Output Format. We conduct experiments on three LLMs, i.e., GPT-4o, Gemini-1.5-Pro and <a class="link-external link-http" href="http://Gemini-1.5.Flash" rel="external noopener nofollow">this http URL</a>, and find that TACOMORE helps improve LLM performance in three representative discourse analysis tasks, i.e., the analysis of keywords, collocates and concordances, based on an open corpus of COVID-19 research articles. Our findings show the efficacy of the proposed prompting framework TACOMORE in corpus-based discourse analysis in terms of Accuracy, Ethicality, Reasoning, and Reproducibility, and provide novel insights into the application and evaluation of LLMs in automated qualitative studies.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: Large Language Models (LLMs) do not perform satisfactorily in corpus discourse analysis, with issues such as unsatisfactory performance, hallucinations (i.e., generating inaccurate or fictional information), and non - reproducibility. These problems impede the application and popularization of LLMs in corpus discourse analysis. Specifically, the paper points out: 1. **Performance issues**: When dealing with complex corpus discourse analysis tasks, the accuracy and logical reasoning ability of LLMs have not yet reached the level of human experts. 2. **Hallucination issues**: LLMs sometimes generate information that does not conform to the facts or is fictional, which is a serious problem in academic research that requires a high degree of accuracy. 3. **Non - reproducibility**: Since the output of LLMs may vary due to different environments, hardware, or operators, it is difficult to repeat and verify the experimental results. To solve these problems, the author proposes a prompt framework named TACOMORE. TACOMORE aims to optimize the performance of LLMs in corpus discourse analysis through four principles (Task, Context, Model, Reproducibility) and five basic elements (role description, task definition, task steps, context information, output format). ### Specific improvement measures - **Task refinement**: Decompose complex tasks into specific steps to ensure that LLMs can gradually understand and execute tasks. - **Provide context**: Provide necessary context information for LLMs so that they can better understand the task background and specific content. - **Select an appropriate model**: Select an LLM model suitable for processing a large amount of input data according to task requirements. - **Ensure reproducibility**: Minimize the uncertainty of LLMs' output and improve the stability and reproducibility of results through standardized prompt structures and evaluation methods. ### Experimental verification The author conducted experiments on three representative discourse analysis tasks (keyword analysis, collocation analysis, co - occurrence analysis), using three LLM models: GPT - 4o, Gemini - 1.5 - Pro, and Gemini - 1.5 - Flash. The experimental results show that the TACOMORE framework significantly improves the performance of LLMs in these tasks, especially in terms of accuracy, ethics, reasoning ability, and reproducibility. In conclusion, this paper effectively solves the key problems of LLMs in corpus discourse analysis by proposing the TACOMORE framework, providing new ideas and methods for future research.