Research on the Application of Large Language Models in Automatic Question Generation: A Case Study of ChatGLM in the Context of High School Information Technology Curriculum

Yanxin Chen,Ling He
2024-08-21
Abstract:This study investigates the application effectiveness of the Large Language Model (LLMs) ChatGLM in the automated generation of high school information technology exam questions. Through meticulously designed prompt engineering strategies, the model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts. The evaluation dimensions include the Hitting(the degree of alignment with teaching content), Fitting (the degree of embodiment of core competencies), Clarity (the explicitness of question descriptions), and Willing to use (the teacher's willingness to use the question in teaching). The results indicate that ChatGLM outperforms human-generated questions in terms of clarity and teachers' willingness to use, although there is no significant difference in hit rate and fit. This finding suggests that ChatGLM has the potential to enhance the efficiency of question generation and alleviate the burden on teachers, providing a new perspective for the future development of educational assessment systems. Future research could explore further optimizations to the ChatGLM model to maintain high fit and hit rates while improving the clarity of questions and teachers' willingness to use them.
Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **To explore the application effects of large - language models (LLMs), especially ChatGLM, in automatic question - generation for high - school information technology courses, and to evaluate whether it can be comparable to manual question - generation**. Specifically, the research uses carefully - designed prompt engineering techniques to guide ChatGLM to generate diverse examination questions, and domain experts conduct a comprehensive evaluation from the following dimensions: 1. **Hitting**: The degree of alignment between the questions and the teaching content. 2. **Fitting**: The extent to which the questions reflect the core capabilities. 3. **Clarity**: Whether the description of the questions is clear and unambiguous. 4. **Willing to use**: Whether teachers are willing to use these questions in teaching. By comparing and analyzing the performance of questions generated by ChatGLM and those generated manually in each of the above dimensions, the research aims to verify the practical application potential of LLMs in simulated question - generation. This research not only helps to improve the intelligence level of the automatic question - generation system, but also provides a new perspective and practical basis for the future development of educational technology. ### Research Background High - school information technology courses cover a wide range of knowledge points, which brings a relatively large teaching burden to information technology teachers. In recent years, the application of large - language models (LLMs), especially ChatGLM, has provided new possibilities for solving this problem. Through carefully - designed prompts, LLMs can generate questions that meet the assessment requirements of high - school information technology, thus providing teachers with a more efficient and convenient teaching tool. ### Research Objectives This research is committed to in - depth exploration of the performance of LLMs, especially ChatGLM, in the task of automatic question - generation for high - school information technology, and to evaluate whether it can match the ability of manual question - generators. Through prompt engineering techniques, the research guides LLMs to generate corresponding examination questions, and domain experts conduct detailed evaluations from multiple dimensions, including hitting, fitting, clarity, and willingness to use. Through comparative analysis of different indicators, the research aims to verify the application potential of LLMs in simulated question - generation, and to provide a theoretical basis and practical reference for promoting educational informatization. ### Main Findings The research results show that although ChatGLM has no significant difference from manual question - generation in terms of hitting and fitting, it performs excellently in terms of clarity and teachers' willingness to use. This indicates that ChatGLM has the potential to improve the efficiency of question - generation, reduce the work burden of teachers, and provide new ideas for the development of future educational assessment systems.