Abstract:Purpose: To utilize artificial intelligence (AI) platforms to generate medical illustrations for refractive surgeries, aiding patients in visualizing and comprehending procedures like laser-assisted in situ keratomileusis (LASIK), photorefractive keratectomy (PRK), and small incision lenticule extraction (SMILE). This study displays the current performance of two OpenAI programs in terms of their accuracy in common corneal refractive procedures. Methods: We selected AI image generators based on their popularity, choosing Decoder-Only Autoregressive Language and Image Synthesis 3 (DALL-E 3) for its leading position and Medical Illustration Master (MiM) for its high engagement. We developed six non-AI-generated prompts targeting specific outcomes related to LASIK, PRK, and SMILE procedures to assess medical accuracy. We generated images using these prompts (18 total images per AI platform) and used the final images produced after the sixth prompt for this study (three final images per AI platform). Human-created procedural images were also gathered for comparison. Four experts independently graded the images, and their scores were averaged. Each image was evaluated with our grading system on "Legibility," "Detail & Clarity," "Anatomical Realism & Accuracy," "Procedural Step Accuracy," and "Lack of Fictitious Anatomy," with scores ranging from 0 to 3 per category allowing 15 points total. A score of 15 points signifies excellent performance, indicating a highly accurate medical illustration. Conversely, a low score suggests a poor-quality illustration. Additionally, we submitted the same AI-generated images back into Chat Generative Pre-Trained Transformer-4o (ChatGPT-4o) along with our grading system. This allowed ChatGPT-4o to use and evaluate both AI-generated and human-created images (HCIs). Results: In individual category scoring, HCIs significantly outperformed AI images in legibility, anatomical realism, procedural step accuracy, and lack of fictitious anatomy. There were no significant differences between DALL-E 3 and MiM in these categories (p>0.05). In procedure-specific comparisons, HCIs consistently scored higher than AI-generated images for LASIK, PRK, and SMILE. For LASIK, HCIs scored 14 ± 0.82 (93.3%), while DALL-E 3 scored 4.5 ± 0.58 (30%) and MiM scored 4.5 ± 1.91 (30%) (p<0.001). For PRK, HCIs scored 14.5 ± 0.58 (96.7%), compared to DALL-E 3's 5.25 ± 1.26 (35%) and MiM's 7 ± 3.56 (46.7%) (p<0.001). For SMILE, HCIs scored 14.5 ± 0.68 (96.7%), while DALL-E 3 scored 5 ± 0.82 (33.3%) and MiM scored 6 ± 2.71 (40%) (p<0.001). HCIs significantly outperformed AI-generated images from DALL-E 3 and MiM in overall accuracy for medical illustrations, achieving scores of 14.33 ± 0.23 (95.6%), 4.93 ± 0.69 (32.8%), and 5.83 ± 0.23 (38.9%) respectively (p<0.001). ChatGPT-4o evaluations were consistent with human evaluations for HCIs (3 ± 0, 2.87 ± 0.23; p=0.121) but rated AI images higher than human evaluators (2 ± 0 vs 1.07 ± 0.73; p<0.001). Conclusion: This study highlights the inaccuracy of AI-generated images in illustrating corneal refractive procedures such as LASIK, PRK, and SMILE. Although the OpenAI platform can create images recognizable as eyes, they lack educational value. AI excels in quickly generating creative, vibrant images, but accurate medical illustration remains a significant challenge. While AI performs well with text-based actions, its capability to produce precise medical images needs substantial improvement.

Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases

Assessment of Generative Artificial Intelligence (AI) Models in Creating Medical Illustrations for Various Corneal Transplant Procedures

Evaluating the Accuracy of Artificial Intelligence (AI)-Generated Illustrations for Laser-Assisted In Situ Keratomileusis (LASIK), Photorefractive Keratectomy (PRK), and Small Incision Lenticule Extraction (SMILE)

The Promise and Pitfalls of AI-Generated Anatomical Images: Evaluating Midjourney for Aesthetic Surgery Applications

Generative Artificial Intelligence: Enhancing Patient Education in Cardiovascular Imaging

Generative Artificial Intelligence Methods for Pediatric Genetics Education

Gender bias in generative artificial intelligence text-to-image depiction of medical students

Gender and Ethnicity Bias of Text-to-Image Generative Artificial Intelligence in Medical Imaging, Part 1: Preliminary Evaluation

The Role of Artificial Intelligence in Prediction, Risk Stratification, and Personalized Treatment Planning for Congenital Heart Diseases

Precision of artificial intelligence in paediatric cardiology multimodal image interpretation

The Face of a Surgeon: An Analysis of Demographic Representation in Three Leading Artificial Intelligence Text-to-Image Generators

Applications and implementation of generative artificial intelligence in cardiovascular imaging with a focus on ethical and legal considerations: what cardiovascular imagers need to know!

Can DALL-E 3 Reliably Generate 12-Lead ECGs and Teaching Illustrations?

Artificial Intelligence Portrayals in Orthopaedic Surgery

Exploring the Potential of DALL-E 2 in Pediatric Dermatology: A Critical Analysis

Gender and Ethnicity Bias of Text-to-Image Generative Artificial Intelligence in Medical Imaging, Part 2: Analysis of DALL-E 3

Use of 3D models of congenital heart disease as an education tool for cardiac nurses

AI depictions of psychiatric diagnoses: a preliminary study of generative image outputs in Midjourney V.6 and DALL-E 3

Understanding Student Acceptance, Trust, and Attitudes Toward AI-Generated Images for Educational Purposes

Using AI Text-to-Image Generation to Create Novel Illustrations for Medical Education: Current Limitations as Illustrated by Hypothyroidism and Horner Syndrome

The usefulness of 3D printed heart models for medical student education in congenital heart disease