Abstract
As artificial intelligence (AI), machine learning (ML), and other forms of advanced automation are increasingly considered for deployment in safety-critical industries, there is an urgent need for evaluation methods which reliably identify risks of deployment prior to people being harmed. In this narrative review, we discuss the benefits and drawbacks of 11 major methodological decisions underpinning evaluations of AI-infused technologies from the perspective of cognitive systems engineering (CSE) and naturalistic decision making (NDM). These methodological decisions are organized around four aspirations central to the perspective of CSE and NDM: evaluations of AI-infused technologies should be (1) integrated, (2) naturalistic, (3) grounded, and (4) pattern-centered. We use these aspirations to interpret common human-AI evaluation methods and discuss new evaluation challenges for emerging AI-infused technologies. This narrative review is meant to guide both current methods and future research toward safe and effective strategies for evaluating AI-infused technologies, especially in safety-critical settings.
Keywords
Get full access to this article
View all access options for this article.
