Sage Journals: Discover world-class research

Abstract

This paper describes a methodology for developing a new confidence metric to improve power grid operator reliance on ML event classifiers. Unlike traditional confidence scores that are generated by the ML, this confidence metric is generated by humans who have spent time studying the performance boundaries of the ML classifier. We refer to this metric as an Expert Derived Confidence (EDC) score. As an initial test of our methodology four participants (3 Subject Matter Experts, 1 Novice) learned the boundaries of an ML’s performance by studying a subset of events in the ML’s training data. Next, the participants rated their confidence in the ML’s ability to classify similar events. The researchers found that all participants’ EDC scores were correlated with the ML’s own uncertainty quantification score and on average EDC scores showed greater confidence in the ML’s ability to correctly classify events when compared to the ML’s own confidence scores. In addition, averaging EDC scores across all participants was the strongest predictor of model performance and predicted performance even after controlling for the ML’s own confidence.

Keywords

confidence scores machine learning expertise Artificial Intelligence

Get full access to this article

View all access options for this article.

References

Cristianini

Ricci

(2008). Support Vector Machines. In: Kao

MY.

, (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_415

Dietvors

B. J.

Simmons

J. P.

Massey

(2015). Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114.

Follum

J.D.

Betzsold

N.J.

Yin

Buckheit

(2020). Event Screening Methods for the Eastern Interconnection Situational Awareness and Monitoring System (ESAMS), Richland, WA: Pacific Northwest National Laboratory.

Keogh

atanamahatana

Exact indexing of dynamic time warping. Knowl Inf Syst 7, 358–386 (2005). https://doi.org/10.1007/s10115-004-0154-9

Kuhn

Max

. 2008. “Building Predictive Models in R Using the Caret Package”. Journal of Statistical Software 28 (5):1-26. https://doi.org/10.18637/jss.v028.i05.

Mosier

K. L.

Skitka

L. J.

Burdick

M. D.

Heers

S. T.

(1996, October). Automation bias, accountability, and verification behaviors. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 40, No. 4, pp. 204-208). Sage CA: Los Angeles, CA: SAGE Publications.

Smith-Jentsch

K. A.

Campbell

G. E.

Milanovich

D. M.

Reynolds

A. M.

(2001). Measuring teamwork mental models to support training needs assessment, development, and evaluation: Two empirical studies. Journal of Organizational Behavior: The International Journal of Industrial, Occupational and Organizational Psychology and Behavior, 22(2), 179-194.

Wickens

C. D.

Clegg

B. A.

Vieane

A. Z.

Sebok

A. L.

(2015). Complacency and automation bias in the use of imperfect automation. Human factors, 57(5), 728-739.

Wright

M. C.

Radcliffe

Janzen

Edworthy

Reese

T. J.

Segall

(2020). Organizing audible alarm sounds in the hospital: a card-sorting study. IEEE transactions on human-machine systems, 50(6), 623-627.

10.

Zhang

Liao

Q.V.

Bellamy

R.K.

(2020). Effect of confidence and explanation on accuracy and trust calibration in AI-Assisted Decision Making. In Conference on Fairness, Accountability and Transparency, Barcelona, Spain, ACM.

11.

Zheng

Song

Leung

Goodfellow

(2016). Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4480-4488).

Method for Generating Expert Derived Confidence Scores

Abstract

Keywords

Get full access to this article

References