Objective.
The purpose of this article is to compare the diagnostic accuracy of induced decision trees with that of pruned neural networks and to improve the accuracy and interpretation of breast cancer diagnosis from readings of thin-needle aspirate by identifying cases likely to be misclassified by induced decision rules.
Method.
Using an online database consisting of 699 cases of suspected breast cancer and their corresponding readings of fine-needle aspirate, decision trees were induced from half of the cases, randomly selected. Accuracy was determined for the remaining cases in successive partitions. The pattern of errors in the multiple decision trees was examined. A smaller data set was created with 2 classes: (1) correctly classified and (2) misclassified by a decision tree, rather than the original benign and malignant classes. From this data set, decision trees that describe the misclassified cases were induced.
Results.
Larger, less severely pruned decision trees were more accurate in breast cancer diagnosis for both training and test data. The accuracy of the induced decision trees exceeded that reported for the smaller pruned neural networks. Combining classifications from 2 trees was effective in identifying malignancies missed by a single tree. Induced decision trees were able to identify patterns associated with misclassified cases, but the identification of errors inductively did not improve the overall error rate.
Conclusion.
In this application, a model that is too compact identifies fewer cases of the minority class, malignancy. New methods that combine models and examine classification errors can improve diagnosis by identifying more malignancies and by describing ambiguous cases.