Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted Features

Shulong Li,Panpan Xu,Bin Li,Liyuan Chen,Zhiguo Zhou,Hongxia Hao,Yingying Duan,Michael Folkert,Jianhua Ma,Steve Jiang,Jing Wang
DOI: https://doi.org/10.1088/1361-6560/ab326a
2018-12-27
Abstract:To predict lung nodule malignancy with a high sensitivity and specificity, we propose a fusion algorithm that combines handcrafted features (HF) into the features learned at the output layer of a 3D deep convolutional neural network (CNN). First, we extracted twenty-nine handcrafted features, including nine intensity features, eight geometric features, and twelve texture features based on grey-level co-occurrence matrix (GLCM) averaged from thirteen directions. We then trained 3D CNNs modified from three state-of-the-art 2D CNN architectures (AlexNet, VGG-16 Net and Multi-crop Net) to extract the CNN features learned at the output layer. For each 3D CNN, the CNN features combined with the 29 handcrafted features were used as the input for the support vector machine (SVM) coupled with the sequential forward feature selection (SFS) method to select the optimal feature subset and construct the classifiers. The fusion algorithm takes full advantage of the handcrafted features and the highest level CNN features learned at the output layer. It can overcome the disadvantage of the handcrafted features that may not fully reflect the unique characteristics of a particular lesion by combining the intrinsic CNN features. Meanwhile, it also alleviates the requirement of a large scale annotated dataset for the CNNs based on the complementary of handcrafted features. The patient cohort includes 431 malignant nodules and 795 benign nodules extracted from the LIDC/IDRI database. For each investigated CNN architecture, the proposed fusion algorithm achieved the highest AUC, accuracy, sensitivity, and specificity scores among all competitive classification models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more accurately predict the malignancy of pulmonary nodules in low - dose computed tomography (LDCT) lung cancer screening, so as to improve sensitivity and specificity. Specifically, the author proposes a fusion algorithm that combines handcrafted features (HF) and the features learned by 3D deep convolutional neural network (3D CNN) at the output layer, in order to overcome the limitations of a single method and reduce the dependence on large - scale labeled data sets. In this way, the algorithm aims to reduce the false - positive rate, reduce unnecessary biopsies, and ultimately reduce the morbidity of patients and medical costs. ### Background and Challenges - **High False - Positive Rate**: Although LDCT screening has been proven to reduce lung cancer mortality, its false - positive rate is relatively high, resulting in unnecessary further diagnostic procedures and related medical expenses. - **Limitations of Data Set Scale**: Deep learning methods usually require a large amount of labeled data to train models, but in the field of medical imaging, obtaining large - scale labeled data sets is still challenging. - **Limitations of a Single Method**: Using only handcrafted features or deep - learning features alone cannot fully reflect the unique characteristics of a specific lesion. ### Solutions - **Fusion Algorithm**: The paper proposes a fusion algorithm (SS - OLHF) that combines 29 handcrafted features and the features learned by 3D CNN at the output layer. - **Feature Extraction**: - **Handcrafted Features**: Including 9 intensity features, 8 geometric features, and 12 texture features based on the gray - level co - occurrence matrix (GLCM). - **3D CNN Features**: Extracted from 3D CNN modified from three 2D CNN architectures (AlexNet, VGG - 16 Net, and Multi - crop Net). - **Classifier**: Use the support vector machine (SVM) combined with the sequential forward feature selection (SFS) method to select the optimal feature subset and build a classifier. ### Experimental Setup - **Data Set**: Use 431 malignant nodules and 795 benign nodules in the LIDC/IDRI database for experiments. - **Model Comparison**: Compared with a variety of methods, including the original 2D CNN, other fusion strategies (such as S - FFL and S - FFLHF), the method using only handcrafted features (SS - HF), and the support tensor machine (STM). ### Main Contributions - **Improve Classification Performance**: By combining handcrafted features and 3D CNN features, the accuracy, sensitivity, and specificity of classification are improved. - **Reduce Data Requirements**: Utilize the complementarity of handcrafted features to reduce the dependence on large - scale labeled data sets. - **Innovation**: For the first time, attempt to combine the features learned by 3D CNN at the output layer with handcrafted features to achieve better classification results. Through these methods, this paper provides an effective solution and is expected to improve the prediction accuracy of pulmonary nodule malignancy in clinical practice.