Ethical review of clinical research with generative AI: Evaluating ChatGPT's accuracy and reproducibility

Yasuko Fukataki,Wakako Hayashi,Naoki Nishimoto,Yoichi M Ito
DOI: https://doi.org/10.1101/2024.11.19.24317555
2024-11-20
Abstract:This study evaluated the accuracy and reproducibility of ChatGPT models, specifically GPT-4 and GPT-4o, by reviewing Japanese-language clinical research protocols and informed consent forms using Japanese prompts. The integration of generative AI technologies into clinical research ethics reviews has the potential to enhance consistency, reduce human error, and decrease the manual effort required to assess complex documents. This study primarily aimed to assess and compare the ability of these models to accurately extract and summarize key elements such as research objectives, study design, and ethical considerations, which are critical for ethical review processes. We developed and optimized custom prompts to improve the performance of the models, focusing on the essential aspects of the protocol and informed consent review. The results showed that GPT-4o achieved an 80% accuracy rate in identifying research objectives and a 100% accuracy rate for research design, indicating superior consistency compared with GPT-4, which, despite being slightly less accurate, still showed significant potential for application in ethics reviews. Furthermore, a comparison between customized GPTs and standard prompts revealed that customized GPTs provided significantly higher reproducibility and accuracy, underscoring the value of fine-tuning and Retrieval-Augmented Generation techniques for enhancing AI-assisted review processes. Additionally, challenges in parsing complex PDF documents were identified, highlighting the importance of standardized document formatting to ensure accurate AI analysis. These findings demonstrate the potential of AI-driven systems to improve the efficiency, accuracy, and standardization of research ethics evaluations, potentially setting new standards for AI integration in clinical research practice.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the accuracy and repeatability of generative AI (especially ChatGPT's GPT - 4 and GPT - 4o models) in reviewing Japanese clinical research protocols and informed consent forms. Specifically, the research aims to: 1. **Evaluate the accuracy of AI models**: Through customized Japanese prompts, evaluate the accuracy of these models in extracting and summarizing key elements (such as research objectives, research design, and ethical considerations). These elements are crucial for the ethical review process. 2. **Compare the performance of different AI models**: Compare the performance of GPT - 4 and GPT - 4o models in processing Japanese documents to determine which model performs better in identifying research objectives and designs. 3. **Optimize the AI - assisted review process**: Develop and optimize customized prompts to improve the performance of models in protocol and informed consent review, with an emphasis on important aspects in protocols and informed consent forms. 4. **Explore the importance of standardized document formats**: Identify the challenges that AI encounters when parsing complex PDF documents, and emphasize the necessity of standardized document formats for ensuring the accuracy of AI analysis. 5. **Verify the application potential of AI technology**: Explore the potential of generative AI technology in improving the efficiency, accuracy, and standardization of ethical review, and provide basic data for future research and applications. Through these evaluations, the research hopes to reveal the potential advantages and limitations of generative AI in the ethical review of clinical research and provide directions for improving AI - assisted review tools.