Artificial intelligence and clinical guidance in male reproductive health: ChatGPT4.0's AUA/ASRM guideline compliance evaluation

Oya Gokmen,Tugba Gurbuz,Belgin Devranoglu,Muhammet Ihsan Karaman
DOI: https://doi.org/10.1111/andr.13693
2024-07-19
Andrology
Abstract:Background Male infertility is defined as the inability of a male to achieve a pregnancy in a fertile female by the American Urological Association (AUA) and the American Society for Reproductive Medicine (ASRM). Artificial intelligence, particularly in language processing models like ChatGPT4.0, offers new possibilities for supporting clinical decision‐making. This study aims to assess the effectiveness of ChatGPT4.0 in responding to clinical queries regarding male infertility, which is aligned with AUA/ASRM guidelines. Methods This observational study employed a design to evaluate the performance of ChatGPT4.0 across 1073 structured clinical queries categorized into true/false, multiple‐choice, and open‐ended. Two independent reviewers specializing in reproductive medicine assessed the responses using a six‐point Likert scale to evaluate accuracy, relevance, and guideline adherence. Results In the true/false category, the initial accuracy was 92%, which increased to 94% by the end of the study period. For multiple‐choice questions, accuracy improved from 85% to 89%. The most significant gains were seen in open‐ended questions, where accuracy rose from 78% to 86%. Initially, some responses did not fully align with the AUA/ASRM guidelines. However, by the end of the 60 days, these responses had become more comprehensive and clinically relevant, indicating an improvement in the model's ability to generate guideline‐conformant answers (p
andrology
What problem does this paper attempt to address?