Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based Experiments

Toru Ishida,Tongxi Liu,Hailong Wang,William K. Cheunga
2024-08-12
Abstract:Workshop courses designed to foster creativity are gaining popularity. However, even experienced faculty teams find it challenging to realize a holistic evaluation that accommodates diverse perspectives. Adequate deliberation is essential to integrate varied assessments, but faculty often lack the time for such exchanges. Deriving an average score without discussion undermines the purpose of a holistic evaluation. Therefore, this paper explores the use of a Large Language Model (LLM) as a facilitator to integrate diverse faculty assessments. Scenario-based experiments were conducted to determine if the LLM could integrate diverse evaluations and explain the underlying pedagogical theories to faculty. The results were noteworthy, showing that the LLM can effectively facilitate faculty discussions. Additionally, the LLM demonstrated the capability to create evaluation criteria by generalizing a single scenario-based experiment, leveraging its already acquired pedagogical domain knowledge.
Computers and Society,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of achieving holistic evaluation in workshop courses. Specifically, even experienced teaching teams find it difficult to integrate assessment opinions from different perspectives. Especially in the absence of sufficient discussion, simply averaging scores will undermine the purpose of holistic evaluation. Therefore, this paper explores the possibility of using large - language models (LLMs) as facilitators to integrate diverse teacher assessments. Through scenario - based experiments, researchers aim to determine whether LLMs can integrate diverse evaluations, explain the educational theories behind them to teachers, and whether LLMs can generalize general evaluation criteria from specific cases. The paper explores these issues through four specific scenario experiments: 1. **Integrating Different Opinions**: In the assessment of students' technical workshop reflection papers, teachers present different evaluation perspectives in terms of motivation, technical understanding, report format, etc. The LLMs need to synthesize these different opinions and give a balanced conclusion. 2. **Evaluating Student Growth**: It discusses whether more importance should be attached to achievement or growth when evaluating students. The LLMs need to balance these two views and give a reasonable evaluation. 3. **Handling Peer Evaluation**: It explores how to handle students' performance in teams, especially when peer evaluation may affect the team atmosphere. The LLMs need to judge whether these evaluations should be included in the final score. 4. **Considering Unique Contributions**: When evaluating students with unique talents and contributions, the LLMs need to consider these factors and give an appropriate score. Through these experiments, the paper demonstrates the capabilities of LLMs in integrating diverse assessments, explaining educational theories, and generalizing general evaluation criteria from specific cases. These capabilities indicate that LLMs can be strong partners for teaching teams, helping them conduct holistic evaluations better. However, the paper also points out the ethical issues that need to be considered when using LLMs for educational assessment, such as potential bias and transparency issues.