Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding

Jie Tian,Jixin Hou,Zihao Wu,Peng Shu,Zhengliang Liu,Yujie Xiang,Beikang Gu,Nicholas Filla,Yiwei Li,Ning Liu,Xianyan Chen,Keke Tang,Tianming Liu,Xianqiao Wang

2024-01-14

Abstract:This study is a pioneering endeavor to investigate the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics. Our examination involves a manually crafted exam encompassing 126 multiple-choice questions, spanning various aspects of mechanics courses, including Fluid Mechanics, Mechanical Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against engineering faculties and students with or without mechanical engineering background. The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics. This signals the potential future improvements for GPT models in handling symbolic calculations and tensor analyses. The performances of LLMs were all significantly improved with explanations prompted prior to direct responses, underscoring the crucial role of prompt engineering. Interestingly, GPT-3.5 demonstrates improved performance with prompts covering a broader domain, while GPT-4 excels with prompts focusing on specific subjects. Finally, GPT-4 exhibits notable advancements in mitigating input bias, as evidenced by guessing preferences for humans. This study unveils the substantial potential of LLMs as highly knowledgeable assistants in both mechanical pedagogy and scientific research.

Computation and Language,Artificial Intelligence,Physics Education

What problem does this paper attempt to address?

This paper aims to explore the capability of large-scale language models (LLMs) in answering conceptual questions in the field of mechanical engineering, particularly those related to mechanics. The study conducted an exam consisting of 126 multiple-choice questions, covering various subfields of mechanics including fluid mechanics, mechanical vibrations, engineering statics and dynamics, material mechanics, elastic theory, and continuum mechanics. In the paper, the researchers evaluated the performance of three LLMs: ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), and compared them with LLMs possessing or not possessing certain abilities.

Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding

Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual Question Evaluation in Engineering

Exploring the Use of Large Language Models (LLMs) in Chemical Engineering Education: Building Core Course Problem Models

Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics

MechGPT, a language-based strategy for mechanics and materials modeling that connects knowledge across scales, disciplines and modalities

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

How Can Large Language Models Help Humans in Design and Manufacturing?

Evaluating Large Language Models in Ophthalmology

Performance of Large Language Models in a Computer Science Degree Program

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education

A Large Language Model Approach to Educational Survey Feedback Analysis

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Large language models (LLMs) in radiology exams for medical students: Performance and consequences

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions

Efficiently Measuring the Cognitive Ability of LLMs: an Adaptive Testing Perspective

Large Language Models in Computer Science Education: A Systematic Literature Review

Evaluating large language models in analysing classroom dialogue

Towards an Understanding of Large Language Models in Software Engineering Tasks

How understanding large language models can inform the use of ChatGPT in physics education