Application of data mining in computer-aided diagnosis of lung cancer

Hui Chen,Xiao-hua Wang
DOI: https://doi.org/10.3321/j.issn:1673-8225.2007.05.025
2007-01-01
Abstract:AIM: To analyze several classification methods in data mining and compare their diagnostic performance when used in computer-aided diagnosis system. METHODS: Two hundred cases of solitary pulmonary nodules confirmed by biopsy pathology with surgery operation or puncturation in Beijing Friendship Hospital and Beijing Institute of Tuberculosis and Thoracic Tumor between June 1998 and December 2004 were collected including 135 peripheral lung cancers and 65 benign nodules. Two clinical features (age and having blood streak in phlegm or not) and 5 thin-slice CT signs of each nodule were determined and quantified. 200 valid samples were randomly divided into training samples and examination samples at the radio of 7:3. Diagnostic classificators were established through Fisher linear discriminated function, Logistic regression function, decision tree and neural network model,and validated by examination samples. Index such as sensitivity and specialty were used to evaluate the accuracy of the classificators; and area under ROC curve were adopted to compare the diagnostic performance of these classificators. RESULTS: ①In the diagnosis of 60 cases, sensitivities of the four classificators were 84.6%, 87.2%, 87.2% and 87.2%, specialties of them were 85.7%, 81.0%, 76.2% and 81.0%, respectively. ②Areas under ROC curve by four classificators were 0.918, 0.918, 0.939 and 0.942, no significant difference was found in the comparison between any two of them (P =0.898 2, 0.157 6, 0.349 5, 0.285 7, 0.431 9 and 0.986 8). CONCLUSION: In terms of classified accuracy, understandability and helpfulness to clinical diagnosis, Logistic regression and BP neural network have higher diagnostic accuracy; discriminated analysis, Logistic regression and decision tree have higher understandabilities; BP neural network does better in actual diagnostic decision. All these methods can be applied in computer-aided diagnosis system.
What problem does this paper attempt to address?