A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

Muwei Jian,Hongyu Chen,Zaiyong Zhang,Nan Yang,Haorang Zhang,Lifu Ma,Wenjing Xu,Huixiang Zhi
2024-06-26
Abstract:Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem this paper attempts to address is the limitation of existing Computer-Aided Diagnosis (CAD) systems in accurately predicting various types of lung cancer, despite their good performance in detecting lung nodules. This limitation is primarily due to the lack of publicly available datasets with expert-level cancer type annotations. To solve this issue, the authors constructed a lung CT image dataset with precise cancer type annotations and conducted various experimental validations to promote more refined lung disease classification and precise treatment recommendations. Specifically, the main objectives of this study include: 1. **Constructing a high-quality dataset**: The authors collected 330 lung nodule CT images from 95 patients, with these nodules being precisely annotated by professional doctors using clinical, frozen section, and pathological diagnostic information. 2. **Improving the accuracy of small nodule detection**: The dataset includes many small and tiny nodules, which pose a challenge to the detection performance of existing CAD systems. 3. **Promoting the classification of various lung cancer types**: The cancer type labels in the dataset are rich, including benign, adenocarcinoma, and squamous cell carcinoma, which helps in developing CAD systems capable of distinguishing different types of lung cancer. Through these efforts, the authors hope to provide a reliable tool to help make medical diagnoses more accurate, thereby offering more personalized treatment recommendations for patients.