Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines

Zhi Cao
2023-05-23
Abstract:This work conducts a comprehensive exploration into the proficiency of OpenAI's ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4's performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.
Digital Libraries
What problem does this paper attempt to address?
This paper aims to evaluate the effectiveness of ChatGPT-4 in providing scientific references across multiple disciplines. The study covers fields and subfields such as computer science, mechanical engineering, electrical engineering, biomedical engineering, and medicine. Through experiments, the authors found that ChatGPT-4's citation effectiveness exceeds 65% in computer science, biomedical engineering, and medicine, but fails to validate any valid citations in the fields of mechanical engineering and electrical engineering. Furthermore, for more specialized research topics, ChatGPT-4 tends to provide references related to broader subjects rather than precisely matching narrow field literature. The paper suggests that the performance differences of ChatGPT-4 may be attributed to the nature and structure of its training data, with some fields potentially having more easily accessible online resources while others may have fewer or more complex resources. This highlights the necessity of training AI models in specific domains and improving precision, emphasizing that human supervision and verification remain indispensable in academic research. The conclusion is that although ChatGPT-4 can serve as a useful tool in preliminary research in certain fields, its retrieval capability in specific and specialized topics is limited and further improvements are needed to enhance its reliability and adaptability across various subjects.