Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases

Mohamad-Hani Temsah,Abdullah N. Alhuzaimi,Mohammed Almansour,Fadi Aljamaan,Khalid Alhasan,Munirah A. Batarfi,Ibraheem Altamimi,Amani Alharbi,Adel Abdulaziz Alsuhaibani,Leena Alwakeel,Abdulrahman Abdulkhaliq Alzahrani,Khaled B. Alsulaim,Amr Jamal,Afnan Khayat,Mohammed Hussien Alghamdi,Rabih Halwani,Muhammad Khurram Khan,Ayman Al-Eyadhy,Rakan Nazer
DOI: https://doi.org/10.1007/s10916-024-02072-0
IF: 4.92
2024-05-24
Journal of Medical Systems
Abstract:Artificial Intelligence (AI), particularly AI-Generated Imagery, has the potential to impact medical and patient education. This research explores the use of AI-generated imagery, from text-to-images, in medical education, focusing on congenital heart diseases (CHD). Utilizing ChatGPT's DALL·E 3, the research aims to assess the accuracy and educational value of AI-created images for 20 common CHDs. In this study, we utilized DALL·E 3 to generate a comprehensive set of 110 images, comprising ten images depicting the normal human heart and five images for each of the 20 common CHDs. The generated images were evaluated by a diverse group of 33 healthcare professionals. This cohort included cardiology experts, pediatricians, non-pediatric faculty members, trainees (medical students, interns, pediatric residents), and pediatric nurses. Utilizing a structured framework, these professionals assessed each image for anatomical accuracy, the usefulness of in-picture text, its appeal to medical professionals, and the image's potential applicability in medical presentations. Each item was assessed on a Likert scale of three. The assessments produced a total of 3630 images' assessments. Most AI-generated cardiac images were rated poorly as follows: 80.8% of images were rated as anatomically incorrect or fabricated, 85.2% rated to have incorrect text labels, 78.1% rated as not usable for medical education. The nurses and medical interns were found to have a more positive perception about the AI-generated cardiac images compared to the faculty members, pediatricians, and cardiology experts. Complex congenital anomalies were found to be significantly more predicted to anatomical fabrication compared to simple cardiac anomalies. There were significant challenges identified in image generation. Based on our findings, we recommend a vigilant approach towards the use of AI-generated imagery in medical education at present, underscoring the imperative for thorough validation and the importance of collaboration across disciplines. While we advise against its immediate integration until further validations are conducted, the study advocates for future AI-models to be fine-tuned with accurate medical data, enhancing their reliability and educational utility.
health care sciences & services,medical informatics
What problem does this paper attempt to address?