Abstract
This paper describes a methodology for developing a new confidence metric to improve power grid operator reliance on ML event classifiers. Unlike traditional confidence scores that are generated by the ML, this confidence metric is generated by humans who have spent time studying the performance boundaries of the ML classifier. We refer to this metric as an Expert Derived Confidence (EDC) score. As an initial test of our methodology four participants (3 Subject Matter Experts, 1 Novice) learned the boundaries of an ML’s performance by studying a subset of events in the ML’s training data. Next, the participants rated their confidence in the ML’s ability to classify similar events. The researchers found that all participants’ EDC scores were correlated with the ML’s own uncertainty quantification score and on average EDC scores showed greater confidence in the ML’s ability to correctly classify events when compared to the ML’s own confidence scores. In addition, averaging EDC scores across all participants was the strongest predictor of model performance and predicted performance even after controlling for the ML’s own confidence.
Get full access to this article
View all access options for this article.
