Analyzing Large Language Models for Classroom Discussion Assessment

Nhat Tran,Benjamin Pierce,Diane Litman,Richard Correnti,Lindsay Clare Matsumura
2024-06-13
Abstract:Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as large language models (LLMs). In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive consistency of the 2 LLMs. Our results suggest that the 3 aforementioned factors do affect the performance of the tested LLMs and there is a relation between consistency and performance. We recommend a LLM-based assessment approach that has a good balance in terms of predictive performance, computational efficiency, and consistency.
Computation and Language
What problem does this paper attempt to address?
This paper discusses how to use large-scale language models (LLMs) to automatically evaluate the quality of classroom discussions. In the study, the authors analyzed three factors that may affect the performance of LLMs in evaluation: task formulation, context length, and the use of a small number of examples. They found that these factors do impact the performance of LLMs, and there is a relationship between predictive performance and consistency. The paper suggests adopting an LLM evaluation method that balances predictive performance, computational efficiency, and consistency. In addition, they studied the ability of LLMs to handle long text inputs and enhance performance with a small number of examples. The experimental results show that the binary counting method outperforms the baseline model BERT after adding a small number of examples. The paper also emphasizes the importance of cutting context length and providing more challenging negative examples in improving classification performance. Finally, they provide code to support the reproducibility of the research.