Artificial Intelligence in Medical Imaging: Analyzing the Performance of ChatGPT and Microsoft Bing in Scoliosis Detection and Cobb Angle Assessment

Artur Fabijan,Agnieszka Zawadzka-Fabijan,Robert Fabijan,Krzysztof Zakrzewski,Emilia Nowosławska,Bartosz Polis
DOI: https://doi.org/10.3390/diagnostics14070773
IF: 3.6
2024-04-06
Diagnostics
Abstract:Open-source artificial intelligence models (OSAIM) find free applications in various industries, including information technology and medicine. Their clinical potential, especially in supporting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in artificial intelligence (AI) for diagnostic purposes, we conducted a study evaluating the capabilities of AI models, including ChatGPT and Microsoft Bing, in the diagnosis of single-curve scoliosis based on posturographic radiological images. Two independent neurosurgeons assessed the degree of spinal deformation, selecting 23 cases of severe single-curve scoliosis. Each posturographic image was separately implemented onto each of the mentioned platforms using a set of formulated questions, starting from 'What do you see in the image?' and ending with a request to determine the Cobb angle. In the responses, we focused on how these AI models identify and interpret spinal deformations and how accurately they recognize the direction and type of scoliosis as well as vertebral rotation. The Intraclass Correlation Coefficient (ICC) with a 'two-way' model was used to assess the consistency of Cobb angle measurements, and its confidence intervals were determined using the F test. Differences in Cobb angle measurements between human assessments and the AI ChatGPT model were analyzed using metrics such as RMSEA, MSE, MPE, MAE, RMSLE, and MAPE, allowing for a comprehensive assessment of AI model performance from various statistical perspectives. The ChatGPT model achieved 100% effectiveness in detecting scoliosis in X-ray images, while the Bing model did not detect any scoliosis. However, ChatGPT had limited effectiveness (43.5%) in assessing Cobb angles, showing significant inaccuracy and discrepancy compared to human assessments. This model also had limited accuracy in determining the direction of spinal curvature, classifying the type of scoliosis, and detecting vertebral rotation. Overall, although ChatGPT demonstrated potential in detecting scoliosis, its abilities in assessing Cobb angles and other parameters were limited and inconsistent with expert assessments. These results underscore the need for comprehensive improvement of AI algorithms, including broader training with diverse X-ray images and advanced image processing techniques, before they can be considered as auxiliary in diagnosing scoliosis by specialists.
medicine, general & internal
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the performance of artificial intelligence models in medical imaging, especially the performance of ChatGPT and Microsoft Bing in detecting scoliosis and measuring the Cobb angle. Specifically, the researchers wanted to verify the following hypothesis: - **H1**: All selected AI models can accurately identify scoliosis based on radiological images. To verify this hypothesis, the researchers selected radiological images of 23 severe single - curve scoliosis cases and tested them on the ChatGPT and Microsoft Bing platforms respectively through a series of preset questions. These tests aimed to evaluate the performance of AI models in the following aspects: 1. **Identification of spinal deformity**: Can the AI model effectively identify spinal deformity? 2. **Judgment of scoliosis direction**: Can the AI model accurately identify the direction of scoliosis (left or right)? 3. **Classification of scoliosis type**: Can the AI model correctly classify the type of scoliosis (such as C - shape)? 4. **Detection of vertebral rotation**: Can the AI model accurately detect the rotation of the vertebral body? 5. **Measurement of Cobb angle**: How accurate is the AI model in measuring the Cobb angle? The research results show that ChatGPT performs well in detecting scoliosis and successfully identifies scoliosis in all 23 images, but has limited effectiveness in measuring the Cobb angle. It only attempts to measure in 43.5% of the images and there are significant inaccuracies and differences. In contrast, Microsoft Bing fails to identify scoliosis in any of the images. These results emphasize the potential of current AI algorithms in assisting the diagnosis of scoliosis, but also point out areas that need further improvement, especially in terms of improving the accuracy and consistency of the algorithms. This includes the application of a broader training data set and advanced image processing techniques.