The Value of Bayes Theorem in the Interpretation of Subjective Diagnostic Findings: What Can We Learn from Agreement Studies?

Abstract

The Bayes theorem is advocated as the appropriate measure for the Weight of evidence in medical decision making. It is based on the calculation of posttest probability as a function of the accuracy of the test and pretest probability. Nevertheless, for subjective diagnostic findings, there might be substantial variability in the accuracy among human observers, making the point estimate of posttest probability imprecise. Although there is limited evidence regarding the actual variability of accuracy among observers for the majority of diagnostic findings, classical observer agreement studies provide us With an indirect estimate of such variability. The aim of this Work Was to explicate the relationship betWeen observer disagreement and variability of posttest probability. Using a random effects signal detection model With 3 stochastic components (betWeen subject, betWeen observer, and residual variations), the authors modeled diagnostic tests With various characteristics and calculated the expected betWeen-observer disagreement and 95% interval of the observers' posttest probability. For the majority of simulated conditions, variation in posttest probability Was surprisingly high, even in the presence of substantial agreement. Although the model is based on parametric assumptions, these results are a clue to a source of inaccuracy in the calculation of posttest probability. Practitioners should be aWare of such variation in their clinical practice, and diagnostic studies need to develop strategies to address this uncertainty.

Keywords

Bayes theorem probability decision making pretest posttest.

Get full access to this article

View all access options for this article.

References

Altman DG , Bland JM Diagnostic tests 1: sensitivity and specificity. Br Med J. 1994;308:1552.

Kraemer HC Measurement of reliability for categorical data in medical research. Stat Methods Med Res. 1992;1:183—99.

Sackett DL Clinical disagreement: I. How often it occurs and why. Can Med Assoc J. 1980;123:499—504.

Beam CA Random-effects models in the receiver operating characteristic curve-based assessment of the effectiveness of diagnostic imaging technology: concepts, approaches, and issues. Acad Radiol. 1995;2(suppl 1):S4—S13.

Wagner RF , Beam CA , Beiden SV Reader variability in mammography and its implications for expected utility over the population of readers and cases. Med Decis Making . 2004;24:561—72.

Beam CA , Layde PM , Sullivan DC Variability in the interpretation of screening mammograms by US radiologists: findings from a national sample. Arch Intern Med. 1996;156:209—13.

Hanley JA The robustness of the ``binormal'' assumptions used in fitting ROC curves . Med Decis Making. 1988;8:197—203.

Swets JA Measuring the accuracy of diagnostic systems. Science. 1988;240:1285—93.

Pastore RE , Crawley EJ , Berens MS , Skelly MA ``Nonparametric'' A' and other modern misconceptions about signal detection theory. Psychon Bull Rev. 2003;10:556—69.

10.

Macmillan NA , Creelman CD Detection Theory: A User's Guide. Cambridge, UK: Cambridge University Press; 1991.

11.

Maddox WT Toward a unified theory of decision criterion learning in perceptual categorization . J Exp Anal Behav. 2002;8: 567—95.

12.

Lusted LB Decision Processes and Observer Error in Medical Diagnosis: Introduction to Medical Decision Making. Springfield, IL: Charles C Thomas; 1968.

13.

Metz CE , Shen JH Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Med Decis Making . 1992;12:60—75.

14.

Obuchowski NA , Beiden SV , Berbaum KS , et al. Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol. 2004;11:980—95.

15.

Monsour MJ , Evans AT , Kupper LL Confidence intervals for posttest probability. Stat Med. 1991;10:443—56.

16.

Phelps MA , Levitt MA

Pretest probability estimates: a pitfall to the clinical utility of evidence-based medicine?

Acad Emerg Med. 2004;11:692—4.

17.

Rodger MA , Maser E. , Stiell I. , Howley HE , Wells PS The inter-observer reliability of pretest probability assessment in patients with suspected pulmonary embolism. Thromb Res. 2005;116: 101—7.

18.

Lachs MS , Nachamkin I. , Edelstein PH , Goldman J. , Feinstein AR , Schwartz JS Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. Ann Intern Med. 1992;117:135—40.

19.

Sackett DL , Straus S. , Richardson WS , Rosenberg W. , Haynes RB Evidence-Based Medicine: How to Practice and Teach EBM. 2nd ed. Edinburgh, UK: Churchill Livingstone; 2000. 20. Flury B. A First Course in Multivariate Statistics. New York: Springer; 1997.

20.

Tihansky DP Properties of bivariate normal cumulative distribution . Acad Emerg Med. 1972;67:903—5.

21.

National Bureau of Standards, U.S. Department of Commerce. Table of the Bivariate Normal Distribution Function and Related Functions. Applied Mathematics Series 50. Washington, DC: Government Printing Office; 1960.