Identifying Diagnostic Errors with Induced Decision Trees

Abstract

Objective. The purpose of this article is to compare the diagnostic accuracy of induced decision trees with that of pruned neural networks and to improve the accuracy and interpretation of breast cancer diagnosis from readings of thin-needle aspirate by identifying cases likely to be misclassified by induced decision rules. Method. Using an online database consisting of 699 cases of suspected breast cancer and their corresponding readings of fine-needle aspirate, decision trees were induced from half of the cases, randomly selected. Accuracy was determined for the remaining cases in successive partitions. The pattern of errors in the multiple decision trees was examined. A smaller data set was created with 2 classes: (1) correctly classified and (2) misclassified by a decision tree, rather than the original benign and malignant classes. From this data set, decision trees that describe the misclassified cases were induced. Results. Larger, less severely pruned decision trees were more accurate in breast cancer diagnosis for both training and test data. The accuracy of the induced decision trees exceeded that reported for the smaller pruned neural networks. Combining classifications from 2 trees was effective in identifying malignancies missed by a single tree. Induced decision trees were able to identify patterns associated with misclassified cases, but the identification of errors inductively did not improve the overall error rate. Conclusion. In this application, a model that is too compact identifies fewer cases of the minority class, malignancy. New methods that combine models and examine classification errors can improve diagnosis by identifying more malignancies and by describing ambiguous cases.

Keywords

induced decision trees artificial neural networks breast cancer computer-assisted diagnosis

Get full access to this article

View all access options for this article.

References

Wolberg WH , Mangarisian OL . Computer-designed expert systems for breast cytology diagnosis. Analytical Quant Cytol Histol. 1993;15:67-94.

Breiman L , Friedman J , Olshen R , Stone C. Classification and Regression Trees. Belmont (CA): Wadsworth; 1984.

Quinlan JR. Discovering rules from large collections of examples: a case study. In: Michie D , editor. Expert Systems in the Micro Electronic Age. Edinburgh, UK: Edinburgh University Press; 1979. p. 168-201.

Lu H , Setiono R , Liu H. Effective data mining using neural networks. IEEE Trans Knowl Data Eng. 1996;8:957-961.

Setiono R. Extracting rules from pruned neural networks for breast cancer diagnosis. Artific Intell Med. 1996;8:37-51.

Murphy PM , Aha DW . UCI repository of machine learning databases. Available at: http://www.ics.uci.edu/%7E.learn/MLRepository.html. Department of Information and Computer Science, University of California, Irvine; 1994. Accessed 24 June 1998.

Kononenko I , Bratko I , Roskar E. Experiments in automatic learning of medical diagnostic rules. Technical report. Ljubljana, Yugoslavia: Jozef Stefan Institute; 1984.

Wolberg WH , Tanner MA , Loh W-Y , Vanichsetakul N. Statistical approach to fine-needle aspiration diagnosis of breast masses. Acta Cytol. 1987;31:737-741.

Mingers J. Expert systems—rule induction with statistical data. J Operational Res Soc. 1987;38(1):39-47.