Sage Journals: Discover world-class research

Abstract

Ball-strike judgments in Major League Baseball (MLB) remain susceptible to human error, potentially affecting game outcomes and competitive fairness. This study develops an AI umpire using deep learning to assess the consistency of human umpire decisions. A deep neural network trained on 362,734 pitches from the 2019 MLB season achieved an accuracy of 93%. The AI umpire’s high-confidence predictions (probability >0.9 or <0.1) were compared against actual umpire calls to identify systematic discrepancies. Results indicate that these discrepancies were not randomly distributed but concentrated in specific game contexts—particularly early innings and first-pitch situations—suggesting situational influences on judgment. Moreover, some umpires exhibited spatial biases by consistently expanding or contracting the strike zone relative to the AI model’s zone. These findings suggest that umpire decisions are shaped by game dynamics rather than being solely rule-bound. This study proposes a novel AI-based evaluation framework for officiating consistency, illustrating the potential of AI to support more objective and reliable assessments than human umpires.

Keywords

AI umpire MLB officiating strike zone evaluation sports analytics

Get full access to this article

View all access options for this article.

References

Williams

MT.

MLB must embrace technology to fix poor umpire performance. Vol. 99(3). InPhi Kappa Phi Forum. Honor Society of Phi Kappa Phi, 2019, pp.22–26. DOI: link.gale.com/apps/doc/A606081315/AONE?u=anon~145fcf9f&sid=googleScholar&xid=46747b6d (accessed 18 March 2025).

Mills

BM.

Technological innovations in monitoring and evaluation: evidence of performance impacts among Major League Baseball umpires. Labour Econ 2017; 46: 189–199.

Green

Daniels

DP.

Impact aversion in arbitrator decisions. SSRN 2391558. 2015. SSRN.

Huang

Hsu

HJ.

Approximating strike zone size and shape for baseball umpires under different conditions. Int J Perform Anal Sports 2020; 20(2): 133–149.

Flannagan

Mills

Goldstone

RL.

The psychophysics of home plate umpire calls. Sci Rep 2024; 14(1): 2735.

Guérette

Blais

Fiset

Verbal aggressions against Major League Baseball umpires affect their decision making. Psychol Sci 2024; 35(3): 288–303.

Hsu

Umpire home bias in major league baseball. J Sports Econ 2024; 25(4): 423–442.

MacMahon

Starkes

JL.

Contextual influences on baseball ball-strike decisions in umpires, players, and controls. J Sports Sci 2008; 26(7): 751–760.

Chen

Moskowitz

Shue

Decision making under the gambler’s fallacy: evidence from asylum judges, loan officers, and baseball umpires. Q J Econ 2016; 131(3): 1181–1242.

10.

Lee

Han

Analyzing the impact of the automatic ball-strike system in professional baseball: a case study on KBO league data. arXiv preprint arXiv: 2407.15779. Epub ahead of print 22 July 2024. DOI: 10.48550/arXiv.2407.15779

11.

Schmidhuber

Deep learning in neural networks: an overview. Neural Netw 2015; 61: 85–117.

12.

Shrestha

Mahmood

Review of deep learning algorithms and architectures. IEEE Access 2019; 7: 53040–53065.

13.

Chollet

Deep learning with Python. Simon and Schuster, 2021.

14.

Wilson

Roelofs

Stern

, et al. The marginal value of adaptive gradient methods in machine learning. Adv Neural Inf Process Syst 2017; 30: 4151–4161.

15.

Johnson

Khoshgoftaar

TM.

Survey on deep learning with class imbalance. J Big Data 2019; 6(1): 1–54.

16.

Bai

Mei

Wang

, et al. Don’t just blame over-parametrization for over-confidence: theoretical analysis of calibration in binary classification. In: International conference on machine learning, Vienna, Austria (virtual conference): Cambridge, MA, 1 July 2021, pp.566–576. PMLR.

17.

Kull

Silva Filho

Flach

. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: Artificial intelligence and statistics, Fort Lauderdale, FL,10 April 2017, pp.623–631. Cambridge, MA: PMLR.

18.

Liu

Wang

Liu

, et al. Application of Hawk-eye technology to sports events. In: 2022 2nd international conference on information technology and contemporary sports (TCS), Guangzhou, China, 24–26 June 2022, pp.1–5. New York, NY: IEEE.

19.

Cant

Kovalchik

Cross

, et al. Validation of ball spin estimates in tennis from multi-camera tracking data. J Sports Sci 2020; 38(3): 296–303.

20.

Spitz

Moors

Wagemans

, et al. The impact of video speed on the decision-making process of sports officials. Cogn Res Princ Implic 2018; 3(1): 16.

21.

Teixeira

Silva

Nazarovets

Carboch

, et al. The video assistant referee in football. Sports Eng 2024; 27(1): 14.

22.

Spitz

Wagemans

Memmert

, et al. Video assistant referees (VAR): the impact of technology on decision making in association football referees. J Sports Sci 2021; 39(2): 147–153.

23.

Carlos

Ezequiel

Anton

How does Video Assistant Referee (VAR) modify the game in elite soccer?

Int J Perform Anal Sports 2019; 19(4): 646–653.

24.

Bloß

Schorer

Loffing

, et al. Physical load and referees’ decision-making in sports games: a scoping review. J Sports Sci Med 2020; 19(1): 149–157.

25.

Parsons

Sulaeman

Yates

, et al. Strike three: discrimination, incentives, and evaluation. Am Econ Rev 2011; 101(4): 1410–1435.

26.

Schweizer

Plessner

MacMahon

Judgment and decision-making. In: Schüler

Wegner

Plessner

, et al. (eds) Sport and exercise psychology: theory and application. Cham: Springer International Publishing, 2023, pp.93–115.

27.

Houison

Lamont-Mills

Kotiw

, et al. Strike 3… out! investigating pre-game moods, performance, and mental health of softball umpires. Sports 2024; 12(2): 50.

28.

Simon

HA.

Invariants of human behavior. Ann Rev Psychol 1990; 41(1): 1–20.

29.

Goel

Is technology a complement or substitute to refereeing in sports? A review. Int J Sports Technol Sci 2024; 2(1): 13–23.

30.

Oyeniran

Adewusi

Adeleke

, et al. Ethical AI: addressing bias in machine learning models and software applications. Comput Sci IT Res J 2022; 3(3): 115–126.

31.

Gaonkar

Cook

Macyszyn

Ethical issues arising due to bias in training AI algorithms in healthcare and data sharing as a potential solution. AI Ethics J 2020; 1(1): 1–9.

32.

Rudin

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019; 1(5): 206–215.

AI-assisted umpire evaluation in baseball: A data-driven approach to assessing ball-strike decisions

Abstract

Keywords

Get full access to this article

References