Abstract:Introduction: Original research in radiology often involves handling large datasets, data manipulation, statistical tests, and coding. Recent studies show that large language models (LLMs) can solve bioinformatics tasks, suggesting their potential in radiology research. This study evaluates an LLM's ability to provide statistical and deep learning solutions and code for radiology research. Materials and methods: We used web-based chat interfaces available for ChatGPT-4o, ChatGPT-3.5, and Google Gemini. EXPERIMENT 1: BIOSTATISTICS AND DATA VISUALIZATION: We assessed each LLMs' ability to suggest biostatistical tests and generate R code for the same using a Cancer Imaging Archive dataset. Prompts were based on statistical analyses from a peer-reviewed manuscript. The generated code was tested in R Studio for correctness, runtime errors and the ability to generate the requested visualization. EXPERIMENT 2: DEEP LEARNING: We used the RSNA-STR Pneumonia Detection Challenge dataset to evaluate ChatGPT-4o and Gemini's ability to generate Python code for transformer-based image classification models (Vision Transformer ViT-B/16). The generated code was tested in a Jupiter Notebook for functionality and run time errors. Results: Out of the 8 statistical questions posed, correct statistical answers were suggested for 7 (ChatGPT-4o), 6 (ChatGPT-3.5), and 5 (Gemini) scenarios. The R code output by ChatGPT-4o had fewer runtime errors (6 out of the 7 total codes provided) compared to ChatGPT-3.5 (5/7) and Gemini (5/7). Both ChatGPT4o and Gemini were able to generate visualization requested with a few run time errors. Iteratively copying runtime errors from the code generated by ChatGPT4o into the chat helped resolve them. Gemini initially hallucinated during code generation but was able to provide accurate code on restarting the experiment. ChatGPT4-o and Gemini successfully generated initial Python code for deep learning tasks. Errors encountered during implementation were resolved through iterations using the chat interface, demonstrating LLM utility in providing baseline code for further code refinement and resolving run time errors. Conclusion: LLMs can assist in coding tasks for radiology research, providing initial code for data visualization, statistical tests, and deep learning models helping researchers with foundational biostatistical knowledge. While LLM can offer a useful starting point, they require users to refine and validate the code and caution is necessary due to potential errors, the risk of hallucinations and data privacy regulations. Summary statement: LLMs can help with coding and statistical problems in radiology research. This can help primary authors trouble shoot coding needed in radiology research.

Integration of a First Order Eddy Current Approximation With 2D FEA for Prediction of PWM Harmonic Losses in Electrical Machines

Practical Evaluation of ChatGPT Performance for Radiology Report Generation

ChatGPT and assistive AI in structured radiology reporting: A systematic review

Radiology Gets Chatty: The ChatGPT Saga Unfolds

ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology

Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing

Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports

Large language models can help with biostatistics and coding needed in radiology research

Evaluating Large Language Models for Automated Reporting and Data Systems Categorization: Cross-Sectional Study

ChatGPT and Large Language Models in Radiology: Perspectives From the Field

From Bench to Bedside With Large Language Models: AJR Expert Panel Narrative Review

Mitigating Hallucinations in Large Language Models: A Comparative Study of RAG-enhanced vs. Human-Generated Medical Templates

Translating musculoskeletal radiology reports into patient-friendly summaries using ChatGPT-4

Empowering Radiologists with ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases

General practitioners’ perspectives on the avoidability of hospitalizations at the end of life: A mixed-method study

A Survey of Clinicians’ Views of the Utility of Large Language Models

Preauthorization of CT and MRI examinations: assessment of a managed care preauthorization program based on the ACR Appropriateness Criteria and the Royal College of Radiology guidelines.

Large language models (LLMs) in radiology exams for medical students: Performance and consequences

Evaluating the Adherence of Large Language Models to Surgical Guidelines: A Comparative Analysis of Chatbot Recommendations and North American Spine Society (NASS) Coverage Criteria

Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective

Performance of Large Language Models in Technical MRI Question Answering: A Comparative Study