Abstract
Ball-strike judgments in Major League Baseball (MLB) remain susceptible to human error, potentially affecting game outcomes and competitive fairness. This study develops an AI umpire using deep learning to assess the consistency of human umpire decisions. A deep neural network trained on 362,734 pitches from the 2019 MLB season achieved an accuracy of 93%. The AI umpire’s high-confidence predictions (probability >0.9 or <0.1) were compared against actual umpire calls to identify systematic discrepancies. Results indicate that these discrepancies were not randomly distributed but concentrated in specific game contexts—particularly early innings and first-pitch situations—suggesting situational influences on judgment. Moreover, some umpires exhibited spatial biases by consistently expanding or contracting the strike zone relative to the AI model’s zone. These findings suggest that umpire decisions are shaped by game dynamics rather than being solely rule-bound. This study proposes a novel AI-based evaluation framework for officiating consistency, illustrating the potential of AI to support more objective and reliable assessments than human umpires.
Get full access to this article
View all access options for this article.
