A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

Muwei Jian,Haoran Zhang,Mingju Shao,Hongyu Chen,Huihui Huang,Yanjie Zhong,Changlei Zhang,Bin Wang,Penghui Gao
2024-06-26
Abstract:Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in imaging data from various captured periods of lung cancer. If the evolution patterns of nodules across various periods in the patients' CT sequences can be explored, it will play a crucial role in guiding the precise screening identification of lung cancer. Therefore, a cross spatio-temporal lung nodule dataset with pathological information for nodule identification and diagnosis is constructed, which contains 328 CT sequences and 362 annotated nodules from 109 patients. This comprehensive database is intended to drive research in the field of CAD towards more practical and robust methods, and also contribute to the further exploration of precision medicine related field. To ensure patient confidentiality, we have removed sensitive information from the dataset.
Image and Video Processing
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: ### Research Background and Objectives - **Challenges in Lung Cancer Diagnosis**: There are numerous challenges in current lung cancer screening and diagnosis, including the time-consuming and subjective nature of manual analysis of large volumes of CT images by doctors, and the complexity and subtle differences in the morphological characteristics of malignant lung nodules that make them difficult for the human eye to recognize. - **Limitations of Computer-Aided Detection (CAD) Systems**: Existing CAD systems and lung datasets mostly focus on CT images at a single time point, neglecting the spatiotemporal features brought by the development and changes of lung nodules in images taken at different time points. ### Problems Addressed - **Construction of a Spatiotemporal Lung Nodule Dataset**: The paper aims to construct a spatiotemporal lung nodule dataset containing pathological information to support the identification and diagnosis of lung nodules. This dataset includes 328 CT sequences and 362 annotated lung nodules from 109 patients, aiming to advance CAD systems towards more practical and robust methods and promote further exploration in the field of precision medicine. ### Dataset Characteristics - **Spatiotemporal Dimension**: The dataset not only covers data from a single time point but also includes CT scans of patients at different time points, which helps observe the changes and development trends of lung nodules over time. - **Integration of Pathological Information**: All nodules are precisely annotated based on pathological reports, ensuring the authenticity and accuracy of the dataset labels. - **Data Integrity and Diversity**: The dataset covers lung nodules of different sizes and types (benign and malignant), as well as CT scans of different thicknesses (1.25 mm and 5 mm), providing a rich source of samples for research. ### Methods and Contributions - **Data Collection and Annotation**: Data were collected through the hospital's electronic medical record system, pathology information system, and picture archiving and communication system, with expert doctors guiding the annotation of the location and contours of the nodules. - **Dataset Structure**: The dataset is organized by temporal and spatial dimensions, supporting the observation of nodule changes from both time and space perspectives. - **Technical Validation**: The paper evaluates the dataset using various classification and detection networks, demonstrating the effectiveness and applicability of the dataset. In summary, this paper aims to overcome the limitations of existing CAD systems by constructing a comprehensive and high-quality spatiotemporal lung nodule dataset, providing strong support for the accurate diagnosis of lung nodules.