Evaluation of a statistically derived decision tree for the cytodiagnosis of fine needle aspirates of the breast (FNAB)

S. Cross,A. Dube,J. Johnson,T. Mcculloch,C. Quincey,R. Harrison,Z. Ma
DOI: https://doi.org/10.1046/j.1365-2303.1998.00135.x
1998-05-01
Cytopathology
Abstract:A decision tree for the diagnosis of FNAB was derived from defined human observations using a rule induction method, C4.5 (a derivative of the ID3 algorithm). This algorithm is an implementation of the top-down induction method where the tree is determined iteratively by adding those nodes and branches which maximize the information gain at each step. The tree was derived from a training set of 200 FNAB with known outcome using 10 defined features (from one observer) and patient age. The tree contained a total of seven nodes (six observable features and patient age) with eight endpoints (four benign, four malignant). The tree was applied to a test set of 400 further FNAB with observations from the training observer and produced a sensitivity of 95%, specificity of 93% and a positive predictive value (PPV) of a malignant result of 89%. Four trainee pathologists were given a training session on the observable features and then used the tree to determine outcome in a further 50 FNAB. The observers were blind to clinical details apart from age and the endpoints were coded with letters and not labelled benign or malignant. The results from these observers produced ranges of sensitivity 80-96%, specificity 64-92%, PPV 73-92% and kappa statistics (with known outcome) 0.6-0.8. Reported difficulties in using the tree included estimation of nuclear size. These results were worse than the performance of the observers on a further 50 cases without using the decision tree (sensitivity 80-100%, specificity 72-100%, PPV 78-100%, kappa 0.72-0.92). The original 50 case test set was rerandomized and the four trainee observers made all 10 defined observations on each specimen without using the decision tree; these observations were then used to derive decisions from the tree. The performance from this method was similar to that using selected features from the tree, suggesting that observation of all features together does not improve the reliability of each specific observation. The poor performance of this tree suggests that this methodology may be unsuitable for producing decision support aids for diagnostic or training purposes in this domain.
What problem does this paper attempt to address?