(298) AUA vs AI: An Inquiry into Testosterone Guidelines

Y Katlowitz,J Khurgin,N Leelani
DOI: https://doi.org/10.1093/jsxmed/qdae001.284
2024-02-07
The Journal of Sexual Medicine
Abstract:Introduction ChatGPT is an artificial intelligence with an accessible user interface that allows for conversational input. It uses deep learning, a subset of machine learning based on artificial neural networks, to "speak" to the user. Patients and clinicians alike have been querying ChatGPT with medical questions, and recent publications have shown that the validity of responses vary. Limited information is available pertaining to ChatGPT's capacity to accurately discuss testosterone management and deficiency. It is imperative for the medical community to be aware of potentially accurate or inaccurate information sources. This project is designed to assess ChatGPT's clinical competency compared to the American Urologic Association ("AUA") guidelines regarding the evaluation and management of testosterone deficiency. Objective To assess the competency of ChatGPT to answer general and clinical questions related to testosterone deficiency and replacement compared to the AUA guidelines. Methods The AUA guidelines for evaluation and management of testosterone deficiency include 31 main points. Each point was prompted to ChatGPT in the form of a question, accompanied by requests for validation of information, providing further information, or clinical next-step recommendations. The answers were then sorted into 3 categories all relative to the AUA guidelines, being accurate and complete, (AC), accurate but incomplete (AI), and incorrect or misleading (IM). Results Of the 31 guideline-based questions queried, 22/31 (71%) were AC, 4/31 (13%) were AI, and 5/31 (16%) were IM. The highest number of AI answers pertained to treatment related queries. The highest number of IM answers pertained to counseling, particularly on commonly contested topics, such as the relationship between testosterone and prostate cancer or cardiovascular health. Conclusions When queried about the evaluation and management of testosterone deficiency, ChatGPT offered complete and accurate answers in 71% of cases. It had increased accuracy of even up to 100% when answering regarding firmly established or more binary topics, such as diagnosis requirements or recommended adjunct testing. As such, patients and providers may consider using ChatGPT as an ancillary resource for uncomplicated or firmly established testosterone related queries. However, it did much more poorly when discussing more contested topics such as testosterone's relationship to prostate cancer, or when approaching elements of medicine that require a more personalized human touch, such as counseling. While ChatGPT may be helpful in the base or binary matters, interpersonal and patient specific matters are best handled by experts in the field. Disclosure No.
urology & nephrology
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to evaluate the performance of the artificial intelligence (AI) tool ChatGPT in answering general and clinical questions related to testosterone deficiency and its replacement therapy, and compare it with the American Urological Association (AUA) guidelines. Specifically, the researchers hope to understand whether ChatGPT can provide accurate and complete information, especially in the following aspects: 1. **Accuracy**: Whether ChatGPT's answers meet the standards in the AUA guidelines. 2. **Completeness**: Whether ChatGPT's answers cover all necessary information. 3. **Misleading**: Whether ChatGPT has the situation of providing wrong or misleading information. Through this study, the authors hope to provide insights into the reliability and limitations of ChatGPT in dealing with testosterone - related management issues for the medical community, thus helping patients and doctors better judge when they can rely on AI tools and when they need to seek the opinions of professional doctors. ### Research background With the development of deep - learning technology, artificial intelligence systems such as ChatGPT have begun to be used to answer medical questions. However, the accuracy of these systems' answers varies, especially in specific fields such as testosterone management, and there is still a lack of sufficient research to verify their reliability. Therefore, this study aims to fill this gap and evaluate the performance of ChatGPT in the assessment and management of testosterone deficiency. ### Research methods Based on 31 main points in the AUA guidelines, the researchers asked ChatGPT corresponding questions and required it to provide further information or clinical suggestions. Then, according to the comparison between ChatGPT's answers and the AUA guidelines, the answers were divided into three categories: - **Accurate and Complete (AC)**: Fully in line with the AUA guidelines. - **Accurate but Incomplete (AI)**: Partially correct, but the information is incomplete. - **Incorrect or Misleading (IM)**: There is wrong or misleading information. ### Results Among the 31 guideline - based questions: - 71% (22/31) of the answers were accurate and complete (AC). - 13% (4/31) of the answers were accurate but incomplete (AI), mainly concentrated in treatment - related inquiries. - 16% (5/31) of the answers were wrong or misleading (IM), especially when dealing with controversial topics, such as the relationship between testosterone and prostate cancer or cardiovascular health. ### Conclusion When answering questions related to the assessment and management of testosterone deficiency, ChatGPT can provide accurate and complete information in most cases (71%), especially performing well on relatively clear topics such as diagnostic requirements or recommended auxiliary tests. However, when discussing more controversial topics (such as the relationship between testosterone and prostate cancer) or situations requiring personalized advice, ChatGPT's performance is poor. Therefore, although ChatGPT can be used as an auxiliary resource for dealing with basic or clear testosterone - related problems, problems involving interpersonal interaction and specific patient situations should still be handled by professional doctors.