MedYOLO: A Medical Image Object Detection Framework

Joseph Sobek,Jose R. Medina Inojosa,Betsy J. Medina Inojosa,S. M. Rassoulinejad-Mousavi,Gian Marco Conte,Francisco Lopez-Jimenez,Bradley J. Erickson
DOI: https://doi.org/10.1007/s10278-024-01138-2
2024-06-08
Abstract:Artificial intelligence-enhanced identification of organs, lesions, and other structures in medical imaging is typically done using convolutional neural networks (CNNs) designed to make voxel-accurate segmentations of the region of interest. However, the labels required to train these CNNs are time-consuming to generate and require attention from subject matter experts to ensure quality. For tasks where voxel-level precision is not required, object detection models offer a viable alternative that can reduce annotation effort. Despite this potential application, there are few options for general purpose object detection frameworks available for 3-D medical imaging. We report on MedYOLO, a 3-D object detection framework using the one-shot detection method of the YOLO family of models and designed for use with medical imaging. We tested this model on four different datasets: BRaTS, LIDC, an abdominal organ Computed Tomography (CT) dataset, and an ECG-gated heart CT dataset. We found our models achieve high performance on commonly present medium and large-sized structures such as the heart, liver, and pancreas even without hyperparameter tuning. However, the models struggle with very small or rarely present structures.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper is primarily dedicated to developing a 3D medical imaging object detection framework named MedYOLO, aiming to address issues encountered by traditional convolutional neural network (CNN)-based segmentation models in identifying structures such as organs and lesions in medical images. Specifically, the paper attempts to solve the following key problems: 1. **Reducing Annotation Workload**: Traditional CNN-based segmentation methods require pixel-level precise annotations, which are not only time-consuming but also costly. In contrast, object detection models can reduce the annotation workload without needing pixel-level precision. 2. **Improving Training Efficiency**: Segmentation models usually require high-quality labels to achieve good performance, which are often difficult to obtain and prone to errors. Object detection models can avoid these issues and improve training efficiency to a certain extent. 3. **Developing an Object Detection Framework for 3D Medical Imaging**: Although 2D object detection models like YOLO already exist, they are not suitable for 3D medical imaging because they require complex conversion processes to handle input and output data, which can lose important three-dimensional spatial information. Therefore, developing an object detection framework specifically for 3D medical imaging is of great significance. MedYOLO adopts YOLOv5 as its foundation, modified to accommodate the data format of 3D medical imaging, and has been tested on various medical imaging datasets, including brain tumors (BRaTS), lung nodules (LIDC), abdominal organ CT, and ECG-gated cardiac CT. Experimental results show that MedYOLO performs excellently in detecting medium and large structures but performs poorly on very small or rare structures. Additionally, compared to another popular medical imaging object detection framework, nnDetection, MedYOLO demonstrates better performance in detecting larger structures. In summary, MedYOLO provides an efficient and accurate object detection solution for 3D medical imaging, particularly suitable for task scenarios that do not require pixel-level precision.