Assessing the Capabilities of Generative Pretrained Transformer-4 in Addressing Open-Ended Inquiries of Oral Cancer

Kaiyuan Ji,Jing Han,Guangtao Zhai,Jiannan Liu
DOI: https://doi.org/10.1016/j.identj.2024.06.024
2024-08-03
Abstract:Introduction and aims: In the face of escalating oral cancer rates, the application of large language models like Generative Pretrained Transformer (GPT)-4 presents a novel pathway for enhancing public awareness about prevention and early detection. This research aims to explore the capabilities and possibilities of GPT-4 in addressing open-ended inquiries in the field of oral cancer. Methods: Using 60 questions accompanied by reference answers, covering concepts, causes, treatments, nutrition, and other aspects of oral cancer, evaluators from diverse backgrounds were selected to evaluate the capabilities of GPT-4 and a customized version. A P value under .05 was considered significant. Results: Analysis revealed that GPT-4 and its adaptations notably excelled in answering open-ended questions, with the majority of responses receiving high scores. Although the median score for standard GPT-4 was marginally better, statistical tests showed no significant difference in capabilities between the two models (P > .05). Despite statistical significance indicated diverse backgrounds of evaluators have statistically difference (P < .05), a post hoc test and comprehensive analysis demonstrated that both editions of GPT-4 demonstrated equivalent capabilities in answering questions concerning oral cancer. Conclusions: GPT-4 has demonstrated its capability to furnish responses to open-ended inquiries concerning oral cancer. Utilizing this advanced technology to boost public awareness about oral cancer is viable and has much potential. When it's unable to locate pertinent information, it will resort to their inherent knowledge base or recommend consulting professionals after offering some basic information. Therefore, it cannot supplant the expertise and clinical judgment of surgical oncologists and could be used as an adjunctive evaluation tool.
What problem does this paper attempt to address?