Abstract:In the domain of education, the integration of,technology has led to a transformative era, reshaping traditional,learning paradigms. Central to this evolution is the automation,of grading processes, particularly within the STEM domain encompassing Science, Technology, Engineering, and Mathematics.,While efforts to automate grading have been made in subjects,like Literature, the multifaceted nature of STEM assessments,presents unique challenges, ranging from quantitative analysis,to the interpretation of handwritten diagrams. To address these,challenges, this research endeavors to develop efficient and reliable grading methods through the implementation of automated,assessment techniques using Artificial Intelligence (AI). Our,contributions lie in two key areas: firstly, the development of a,robust system for evaluating textual answers in STEM, leveraging,sample answers for precise comparison and grading, enabled by,advanced algorithms and natural language processing techniques.,Secondly, a focus on enhancing diagram evaluation, particularly,flowcharts, within the STEM context, by transforming diagrams,into textual representations for nuanced assessment using a,Large Language Model (LLM). By bridging the gap between,visual representation and semantic meaning, our approach ensures accurate evaluation while minimizing manual intervention.,Through the integration of models such as CRAFT for text,extraction and YoloV5 for object detection, coupled with LLMs,like Mistral-7B for textual evaluation, our methodology facilitates,comprehensive assessment of multimodal answer sheets. This,paper provides a detailed account of our methodology, challenges,encountered, results, and implications, emphasizing the potential,of AI-driven approaches in revolutionizing grading practices in,STEM education.

What problem does this paper attempt to address?

This paper attempts to solve the problem of multi - modal answer sheet automatic assessment in the STEM (science, technology, engineering, and mathematics) fields. Specifically, the paper mainly focuses on the following aspects: 1. **Efficient evaluation of text answers**: Traditional text answer evaluation methods have deficiencies in dealing with complex content in the STEM fields, especially when quantitative analysis and complex mathematical concepts are involved. For this reason, the paper proposes a system based on natural language processing (NLP) techniques and advanced algorithms, which can accurately compare and score students' text answers. 2. **Evaluation of hand - drawn diagrams and flowcharts**: Another challenge in STEM assessment is the accurate identification and interpretation of hand - drawn diagrams and flowcharts. These graphics not only contain visual information but also involve logical structure and semantic understanding. By converting the graphics into text representations and using large - language models (LLM) for reasoning, the paper achieves a detailed evaluation of the graphics, ensuring the accuracy of the evaluation and reducing human intervention. ### Main contributions 1. **Text answer evaluation**: - Developed a powerful system to evaluate text answers in the STEM fields. - By introducing sample answers as reference points, the system can efficiently identify and evaluate the nuances of students' responses. - Uses advanced algorithms and natural language processing techniques to achieve precise comparison and scoring. 2. **Flowchart evaluation**: - Emphasizes the context understanding in flowchart evaluation. - By converting the flowchart into a text representation, the system can efficiently analyze and interpret the logical flow and coherence of the flowchart. - Uses LLM for reasoning to ensure the accuracy of the evaluation and reduce manual intervention. ### Methodology The paper proposes a methodology consisting of two key stages: text evaluation and graphic evaluation. - **Text evaluation**: Use the CRAFT model for text area detection, combine it with TrOCR for optical character recognition (OCR), extract the text and store it in a mapped data structure. - **Graphic evaluation**: Use YOLOv5 for object detection, extract the elements in the graphic and their connection relationships, then extract the text within the blocks through Azure OCR or Easy OCR. Finally, convert the graphic into a text representation and pass it to the LLM for reasoning and scoring. ### Challenges and solutions - **OCR challenges**: The extraction of handwritten text is a difficult problem, especially for cursive - style characters. By adopting TrOCR and combining it with line - segmentation techniques, the accuracy of OCR is improved. - **Graphic differences**: Direct comparison of graphics may lead to lower scoring results. By converting the graphics into text representations and passing them to the LLM, this problem is solved, making the evaluation more accurate. ### Results and discussion By testing the answer sheets of students in multiple disciplines, the results show that the system can give expected scores in most cases. However, some factors such as unclear handwriting or inaccurately drawn graphics may still affect the accuracy of the scoring. In general, the automated evaluation system proposed in this paper is of great significance in improving the efficiency and accuracy of STEM education evaluation, providing new ideas and technical support for future education evaluation.

Automated Assessment of Multimodal Answer Sheets in the STEM domain

Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?

A Study of Automated Evaluation of Student’s Examination Paper using Machine Learning Techniques

Automated Content Grading Using Machine Learning

AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams

Automatic short answer grading and feedback using text mining methods

Automatic assessment of text-based responses in post-secondary education: A systematic review

Grading Assistance for a Handwritten Thermodynamics Exam using Artificial Intelligence: An Exploratory Study

Beyond human subjectivity and error: a novel AI grading system

An automated essay scoring systems: a systematic literature review

An AI-Based System for Formative and Summative Assessment in Data Science Courses

Towards LLM-based Autograding for Short Textual Answers

Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics

Get It Scored Using AutoSAS -- An Automated System for Scoring Short Answers

A New Roadmap for Evaluating Descriptive Handwritten Answer Type

VerAs: Verify then Assess STEM Lab Reports

An automated essay evaluation system using natural language processing and sentiment analysi

Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring

Automatic Short Math Answer Grading via In-context Meta-learning

A Machine Learning Approach for Automated Evaluation of Short Answers Using Text Similarity Based on WordNet Graphs

"I understand why I got this grade": Automatic Short Answer Grading with Feedback