Abstract:Background: A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development. Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings. Methods: The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer. Results: From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies. Conclusions: This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.

Evaluating the performance of ChatGPT and Perplexity AI in Business Reference

Generative AI for Business Decision-Making: A Case of ChatGPT

My AI students: Evaluating the proficiency of three AI chatbots in completeness and accuracy

Exploring the Impact of ChatGPT on Business School Education: Prospects, Boundaries, and Paradoxes

Evaluating human resources management literacy: A performance analysis of ChatGPT and bard

A Comparative Analysis of Generative Artificial Intelligence Tools for Natural Language Processing

Chatting about ChatGPT: how may AI and GPT impact academia and libraries?

Exploring the use of generative artificial intelligence in systematic searching: A comparative case study of a human librarian, ChatGPT-4 and ChatGPT-4 Turbo

An Executive Guide to AI, Machine Learning, and Generative AI—With Some Help From ChatGPT and Bard

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

Generative AI Chatbots - ChatGPT versus YouChat versus Chatsonic: Use Cases of Selected Areas of Applied English Language Studies

Role and Challenges of ChatGPT and Similar Generative Artificial Intelligence in Business Management

Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

Comparing and assessing four AI chatbots' competence in economics

Is ChatGPT Leading Generative AI? What is Beyond Expectations?

Generative AI Usage and Exam Performance

Generative AI and Marketing Education: What the Future Holds

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

How generative artificial intelligence portrays science: Interviewing ChatGPT from the perspective of different audience segments

Managing the emerging role of generative AI in next-generation business