What does ChatGPT know about natural science and engineering?

Lukas Schulze Balhorn,Jana M. Weber,Stefan Buijsman,Julian R. Hildebrandt,Martina Ziefle,Artur M. Schweidtmann
2023-09-19
Abstract:ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to have a large impact on society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions from 198 faculty members across 5 faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are on average perceived as ``mostly correct''. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the complexity level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.
Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to evaluate ChatGPT's question-answering capabilities in the fields of natural sciences and engineering, and to explore its impact on education, research, and practical applications. ### Problems the paper attempts to address: 1. **Evaluating ChatGPT's question-answering capabilities**: The paper conducts a systematic empirical evaluation to study the quality of ChatGPT's responses to questions of varying complexity (undergraduate, master's, doctoral). The results show that ChatGPT's answers are generally considered "basically correct," but the scores significantly decrease as the complexity of the questions increases. 2. **Exploring ChatGPT's performance in different disciplines**: The study collected 594 questions from 198 faculty members across five faculties at Delft University of Technology (Aerospace Engineering, Applied Sciences, Civil Engineering and Geosciences, Electrical Engineering Mathematics and Computer Science, Mechanical Materials and Maritime Engineering). The results indicate that ChatGPT scores higher in basic skills and scientific knowledge but scores lower in areas such as critical attitudes that go beyond scientific knowledge. 3. **Discussing educational and ethical impacts**: The study points out that ChatGPT may have a significant impact on higher education, especially in assisting students with their assignments. However, it lacks the ability for critical reflection expected in student answers. Additionally, caution is needed in practical applications of ChatGPT's responses, as there may be completely incorrect answers that could lead to serious consequences. In summary, the paper mainly focuses on ChatGPT's performance in the fields of natural sciences and engineering and its potential impacts.