Testing and Validation of a Custom Trained Large Language Model for HN Patients with Guardrails

L. Zhu,A. Anand,G. Gevorkyan,L.A. McGee,J.C. Rwigema,Y. Rong,S.H. Patel
DOI: https://doi.org/10.1016/j.ijrobp.2024.01.117
IF: 8.013
2024-04-01
International Journal of Radiation Oncology*Biology*Physics
Abstract:Purpose/Objective(s) The goal is to custom-train an advanced large language model (LLM) chatbot utilizing data approved by qualified medical professionals (Physicians and Nurses), for a patient-focused platform for head and neck (HN) cancer patient survivorship and overall well-being. Materials/Methods Seventy unique sets of questions and answers on oropharyngeal cancer patient survivors were collected from institutional records (2021-2022). Additionally, frequently asked questions related to the 40 most common Grade 1+ head and neck (H&N) toxicities observed within our practice were collected. All questions were redacted to protect patient privacy and were then re-entered within the framework of OpenAI's Turbo 4.0 platform. The model was trained on those collected questions and other peer-reviewed literature relevant to the studied diagnosis, with an effort to establish guardrails and refine the model's responses. The questions chosen for training covered various subjects, including pre-operative preparations, post-radiation symptoms, medications, COVID vaccinations, relevant side effects, and pain management. The temperature (a hyperparameter of any LLM) was set to 0.2 in order to reduce the randomness of its responses and leaving the output more focused and deterministic. The model was tested using an independent set of questions, including those outside the training scope. Model accuracy and relevance, as well as failure rates, were assessed by three experienced HN Radiation Oncologists. Results An interactive chatbot using LLM was developed, complemented with an intuitive frontend interface. The mean response time was less than 2 seconds. The chatbot accurately addressed 10 specific questions related to radiation-induced toxicities. The scores of each response remained acceptable for the testing questions, showcasing an overarching comprehension of the posed questions with varied phrasing. However, for questions outside its training scope, fine optimization was needed to reduce instances of model misinterpretation. Conclusion This customized chatbot constitutes a substantial advancement in addressing the pertinent challenge of accessing contemporaneous and medically relevant information within the purview of head and neck (HN) cancer survivorship. The chatbot displays a broad understanding, effectively addressing varied phrasings of the same query. The preliminary validation of this model shows significant potential in offering health assistance to HN patients and professional staff. The model still required fine optimization in order to enforce stricter guardrails as for the mix of questions that remained outside the scope of training documents.
oncology,radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?