Abstract
This investigation examined intervention accuracy and perception of fatigue associated with umpiring state-level Australian Rules football (AF). 39 field umpires (Age: 25.2 ± 6.8 y, Body Mass: 74.6 ± 7.9 kg, Height: 178.9 ± 7.4 cm) employed by the Western Australian Football Commission, were recruited. Intervention accuracy (whether given free kicks were correct) was analysed post-event by the lead investigator using footage of the matches, with two umpire coaches being used for reliability. Perception of fatigue was assessed pre- and post-match using a questionnaire. Data was collected in 22 Western Australian Football League matches in the 2023 season. 384 free kicks were recorded, with 343 correct and 41 incorrect decisions. Each umpire awarded 9.8 ± 3.8 free-kicks, with a distribution of correct-to-incorrect being 8.8 ± 3.2 to 1.1 ± 1.1 free-kicks per match. Intervention accuracy was 89.3 ± 10.4% correct. Significant decline in intervention accuracy between first and second half (W = .931, p = .021) was found. No significant declines in intervention accuracy between quarters were observed. Perception of fatigue was unchanged across matches (p = .074). Our results demonstrate that umpires are generally accurate when adjudicating free kicks but said accuracy declines across a match. The results of this study, provide important insight on free-kick accuracy and perception of fatigue in state-level AF umpires, and may help regulatory bodies designing strategies to improve umpiring outcomes.
Introduction
Australian football (AF) is the most popular spectator sport in Australia, with the elite competition being the Australian Football League (AFL).1,2 Besides the AFL, there are three state-based semi-professional competitions: the South Australian National Football League (SANFL), Victorian Football League (VFL), and West Australian Football League (WAFL). These state competitions often serve as development leagues, whereby future AFL talent is identified and nurtured. AF is typically played on a grass oval, encompassing field dimensions ranging from 135 m to 185 m in length and 110 m to 155 m in width. A normal game features 22 players per team, with 18 on ground and four on the interchange bench, enabling rolling substitutions during match proceedings. 3
Given AF is an invasion-based ball-sport, its officiation demands diverse expertise from umpires to effectively manage and regulate matches.4,5 In a standard semi-professional AF game, the officiating team comprises of three field, three boundary, and two goal umpires, along with an emergency umpire stationed on the interchange bench. Field umpires play a pivotal role in overseeing and influencing the game, by adjudicating free-kicks and initiating play restarts. 6 Among the umpiring disciplines in AF, field umpires bear the greatest responsibility and are arguably the most crucial, given their role in ensuring player safety on the field.7,8
Just as AF is a physically demanding sport, so too is the umpiring of matches. The associated workload has been previously researched, with field umpires being found to typically cover a total distance (TD) of between 10,000 and 15,000 m.8–10 Previous research also observed that AF umpiring elicits significantly high markers of internal load, such as blood lactate (BLa), 11 heart rate (HR), 12 and rating of perceived exertion (RPE), 9 whilst association football refereeing research has shown the same with an increase in RPE during the second half of matches. 13 In addition to the physiological requirements of umpiring AF, there are significant cognitive demands, such as understanding the laws/interpretations 14 and applying them in the correct context. It is also important for umpires to maintain composure and concentration to deliver high-quality field umpiring. 11 Scenarios where the field umpire intervenes in the contest can be termed umpire interventions, which can be likened to “foul judgements” in other sports. Given the large potential impact of umpire interventions on the flow of a match and match outcome, it is crucial to assess their accuracy. Prior research on field umpire decision-making and cognitive demands in AF has explored accuracy among elite5–7,12,15,16 and amateur AF8,9,11,17 umpires in varying level of detail, as well as referees in other football codes like association football18,19 and other ball-sports. 20 Few have, however, examined match decision-making.5,6,12 Much of the previously mentioned research has had certain limitations, including methodologies where field umpires analyse match footage themselves as opposed to somebody independent utilising video-based decisions, relying on match simulation instead of real-world play, 8 and having small subject sample-sizes.8,11,17 Gaps in the literature regarding field umpires at the state level of AF are evident, as to the best of our knowledge, there is a very limited amount of research conducted at the semi-professional level for intervention accuracy and perception of fatigue. This gap is important in the fact that the semi-professional umpires are the talent pool where the elite level recruits from. Semi-professional umpires are generally less experienced than professional umpires. Therefore, any research to identify differences between the groups will lead to a better standard of semi-professional umpires. Whilst zonal decision-making accuracy research does exist, 12 it has been heavily understudied, in particular the difference between attacking and defending end zones. Better understanding these areas will help inform current state-level, semi-professional umpires and coaches as to whether the current coaching system is effective and/or if the current skill-based training regime is competent. More so, the results from this study will investigate whether the current umpiring deployment is adequate for state-level, semi-professional AF, and if there are any possible alterations to improve umpiring. This research is incredibly important as information surrounding real-world decision-making is lacking, with there being a handful of studies that have quantified this area.11,12,15,17,21 This topic gives researchers the clearest indication of the accuracy of umpires in the field and will be able to assist coaches and umpire's themselves the greatest.
As high-quality umpiring is one of the foundations of AF, comprehending the demands associated with semi-professional and professional umpiring, and encompassing the cognitive and psychological traits essential for officiating AF. State-level competition is crucial in developing the umpires who emerge from grassroots levels of junior and community level AF. Hence, this study aimed to evaluate the precision of umpiring interventions within a state-based AF setting.
With this in mind, as a research team, we hypothesise that an umpire's intervention accuracy (measured by post-match analysis of decisions to determine correct and incorrect free kicks) will decrease as a match progresses during sub-elite level Australian Rules football matches, and also that zone of field intervention accuracy will be different between end zones and mid zones during sub-elite Australian Rules football matches. As well as these, we also hypothesise that perceived fatigue in umpires is significantly more post-match compared to pre-match in sub-elite Australian Rules football.
Methods
Participants
This research engaged 39 WAFL field umpires (Male: 37, Female: 2, Age: 25.2 ± 6.8 y, Body Mass: 74.6 ± 7.9 kg, Height: 178.9 ± 7.4 cm) employed by the Western Australian Football Commission (WAFC), between the ages of 18 to 50. Each participant had a minimum of two years of umpiring experience within the WAFL (Experience: 6.8 ± 4.0 y). Each participant was only examined once. Before data collection, details regarding the participant's level of experience (WAFL Reserves, WAFL League, or the Australian Football League) were documented (minimum level of reserves was required). Prior to commencing the study, participants were briefed on the study's procedures and gave written consent during a pre-testing session (outlined below). Additionally, participants were made aware of their right to withdraw from the study at any stage without facing consequences. The Human Research Ethics Committee of Edith Cowan University granted approval for this prospective cohort study (2022-03920-WILSON).
Overview
The study involved one information session, which consisted of an explanation of the study, and one experimental study, where data was collected, per participant. Consent was acquired during the information session. After the study, participants received a gift card of 40 AUD for a sporting goods store for their contribution. Match starting times were subject to ground availabilities and AFL fixturing. The WAFL season (excluding finals) started on April 9th and finished on August 26th of 2023. Data was collected during competitive AF games in the WAFL league and reserves competitions. Free kicks were coded live during matches, with the lead researcher also recorded the timing of each participant's free-kick data during match play. Video recording data was collected from an elevated position in the grandstand, which was then used to identify umpire interventions and their accuracy. The video data was continuously collected throughout the matches. Perception of fatigue was measured with an altered format of the Visual Analogue Scale Questionnaire for Fatigue, 22 the questionnaire was altered to only include questions that related to perception of fatigue.
Intervention accuracy and perception of fatigue
Assessment of participant intervention accuracy was measured using broadcast footage of the matches provided by the WAFC media team in accordance with the WAFC's Media Policies. 23 Match footage was broken down into clips of free kicks using coding software (Sportscode, Hudl, Lincoln, Nebraska, USA). Interventions were categorised as either correct or incorrect. The zone of field (mid-zone, offensive zone, or defensive zone) in which the intervention occurred was also recorded for analysis. Zone of field was determined by recording the free-kick recipient's scoring end (e.g., Team A received a free-kick in their offensive/scoring end, the free-kick was recorded as occurring in the offensive zone). When reviewing the footage, free-kick type (high-contact, holding the ball, etc.), the time and quarter of the infringement, and the team that were awarded the free-kick was recorded, as well as which field umpire made the decision. The lead researcher completed initial analysis on all the clips collected. Perception of fatigue, as described above, was measured pre- and post-match. The questionnaire (Appendix 1) was given to the participants before and after the matches, with participants placing an “X” on the line where they currently felt at the time of completion. The perception of fatigue questionnaire included two separate sub-scales of questions, with them being related to either “Fatigue” or “Energy” with the mean scores analysed for both subscales. For the analysis process (detailed below), both total questionnaire scores and the two subscales were analysed individually.
Inter and intra reliability
To ensure the lead investigator was accurate and reliable, both inter and intra reliability was assessed. The variables (i.e., intervention accuracy, zone of field, and free kick type) all had their reliability assessed. Inter-rater reliability was determined using two independent umpire coaches after the initial coding process had been performed. The coaches had both umpired at least 100 games of AF at a senior level. A selected sample of at least ten percent of the total clips were given to the umpire coaches to analyse.24,25 These clips were randomly selected, with each coach having no prior indication of the match situation before reviewing the clip. The coaches then analysed the clips in full, which included determining the receiving team, type of free-kick, and zone of ground. Intra-rater reliability assessed the consistency of the lead investigator on two separate, blind occasions. Coding of the footage was conducted by the lead researcher who undertook a 10-day test-retest protocol after the initial data collection. 26 The 10-day retest protocol was performed blind (i.e., not being able to see previous analyses during processing) on randomly selected sample of games (approximately 10%). Both intervention accuracy reliability measures used the Yules-Q statistic, as the data was categorical. Due to the data type, Yules-Q statistic had high applicability to the decision-making process of performance analysis.24,26,27 For both testing measures, a Yules-Q statistic of >0.70 was deemed acceptable. 27 Inter and intra reliability for zone of field and type of free kick were analysed using the Cronbach's Alpha formula, with a score of 0.80 or higher acceptable. 27
Data analysis
The data for this study was presented as mean (±SD). Where applicable, data was initially analysed for normality and homogeneity of variances using the Shapiro-Wilk statistic and Levene's test, respectively. 8 Due to not meeting assumption of normality, differences in intervention accuracy between zones, and between quarters were both determined using a Friedman's non-parametric ANOVA. 28 Effect size for the Friedman's ANOVA was presented as Kendall's W. The Holm-Bonferroni sequential correction was applied post-hoc to non-normally distributed data Total match accuracy was explored using descriptive statistics, which determined the percentage of correct free kicks across matches. Free-kick distribution between quarters was explored using the chi-square goodness of fit test. Perceptual fatigue was analysed using Wilcoxon's signed rank test, with the total data and subscale data being split into pre- and post-match. The data analysis was performed using jamovi (version 2.3.12; The jamovi project, NSW, Australia).
Results
Inter and intra reliability
Inter rater reliability between the lead researcher and the two coaches was deemed to be acceptable with a Yules-Q statistic of >0.70. The Yules-Q between the lead researcher and Coach One was 0.73, and Coach Two was 0.80. Inter reliability between the lead researcher and the coaches for ‘zone of ground’ and ‘free-kick type’ was analysed, and of the 20 free-kick types (Appendix 2), the Cronbach's Alpha was α = 0.92, and of the three field zones, the Cronbach's Alpha was α = 1.0. The Yules-Q for the intra reliability of intervention accuracy was 0.94, with the Cronbach's Alpha for both zone of field and the free-kick type being α = 0.95 and α = 0.93, respectively.
Intervention accuracy
A total of 384 free kicks were recorded during data collection, with participating umpires making 343 correct and 41 incorrect decisions. Each umpire awarded 9.8 ± 3.8 free-kicks, with the distribution of correct-to-incorrect being 8.8 ± 3.2 to 1.1 ± 1.1 free-kicks per match, respectively. The intervention accuracy was 89.3 ± 10.4% correct. A Friedman's non-parametric ANOVA revealed no significance (χ2 = 5.23, p = .156, df = 3, W = .07). Table 1 displays the quarter-by-quarter intervention accuracy, with post-hoc testing displaying no significant decline in intervention accuracy between Q1 (90.6 ± 14.4%) and Q2 (92.3 ± 12.7%, p = 1.000), Q1and Q3 (88.4 ± 20.4%, p = .384), Q1 and Q4 (86.5 ± 21.1%, p = .792), Q2 and Q3 (p = .384), Q2 and Q4 (p = .792), and Q3 and Q4 (p = 1.000). As for first half versus second half intervention accuracy, a paired-samples t-test with a Wilcoxon rank coefficient adjustment, displayed a significant decline from first (Mdn = 1.000) to second (Mdn = .889) half (W = .931, p = .021). Chi-square goodness of fit test had an expected free-kick count 96.0 free-kicks/quarter across the entire sample (n = 384), whereas the observed free-kick proportion is significantly different (χ2 = 14.7, p = .002, df = 3), which is observed in Figure 1. Regarding the free kick distribution between quarters, a Friedman's non-parametric ANOVA revealed a significant difference (χ2 = 13.0, p = .005, df = 3, W = .17), with a post-hoc testing displaying significant increases in free-kick distribution between Q1 and Q2 (64 ± 1.1 vs. 104 ± 1.6, p = .006), Q1 and Q3 (64 ± 1.1vs. 112 ± 1.9, p = < .020), and Q1 and Q4 (64 ± 1.1 vs. 104 ± 1.8, p = .028).

Correct and incorrect free-kick distributions throughout match progression of competitive WAFL games.
Quarter by quarter and zone of field data (mean ± SD) on correct and incorrect free kicks, with intervention accuracy, in AF field umpires (n = 384).
Note. aSignificant difference compared with Q1. bSignificant difference compared with Q2. cSignificant difference compared with Q3. dSignificant difference compared with Q4. ySignificant difference compared with Offensive. zSignificant difference compared with Defensive.
As displayed in Table 1, there were 238 mid-zone, 76 offensive zone, and 70 defensive zone free kicks. Table 1 further shows intervention accuracy for the mid, offensive, and defensive zones were 91.4 ± 17.0%, 88.5 ± 17.3%, and 93.5 ± 14.8%, respectively. Chi-square goodness of fit test had an expected free-kick count of 128 free-kicks/quarter across the entire sample (n = 384), whereas the observed free-kick proportion between zones was considered significant (χ2 = 142, p = <.001, df = 2). A Friedman's non-parametric ANOVA revealed no significant differences in intervention accuracy between zones, however (χ2 = 4.07, p = .131, df = 2, W = .05). Non-significance between offensive and defensive zones (p = .141), offensive and mid zones (p = .464), and defensive and mid zones (p = .404) were found.
Perception of fatigue
Total questionnaire score for pre- and post-match was 45.4 ± 13.2 mm and 42.4 ± 17.1 mm, respectively. There were no statistically significant differences in total mean scores pre- and post-match (W = 127.0, p = .074, rrb = 0.49) (Figure 2). Fatigue subscale scores pre- and post-match were 42.1 ± 13.1 mm and 41.2 ± 18.1 mm, respectively, whilst energy subscales were 53.9 ± 9.9 mm and 45.7 ± 15.5 mm, respectively. The fatigue and energy subscale questions both had no significant differences (W = 78.0, p = .991, rrb = 0.71, and W = 5.0, p = .781, rrb = −0.33, respectively) between pre- and post-match analysis (Figure 2).

Pre- and post-match results (mean ± SD) of the visual analogue scale for perceptual fatigue during competitive WAFL games.
Discussion
The aim of this study was to quantify the intervention accuracy of state-level, semi-professional AF umpires during match play. This study also investigated perceived fatigue by umpires during these matches. At the semi-professional, state-level of AF, there is a paucity of research investigating the intervention accuracy of field umpires, with previous studies reporting on amateur AF8,11 and AFL.5,6,12 This study also makes a unique contribution to the current literature, as to the best of our knowledge, it is one of very few to analyse the zone of field where free kicks occur 12 and determine the zone dependent on the receiving team (offensive, defensive end) as well as investigating the perception of fatigue by AF umpires pre- and post-match. The main findings of this study were that while intervention accuracy was generally favourable (overall 89.3 ± 10.4% correct), it showed no difference as match play progressed, with there being no significant reductions between quarters of a match or zones of field, and that participants reported no difference in perception of fatigue or energy pre- and post-match, despite the overall distribution of free-kicks between quarters and zones being.
The intervention accuracy (89.3 ± 10.4%) that was reported for the entire dataset is comparable to that of previous studies investigating in-match decision-making in AF (albeit, not statistically tested) (87%, 79.2%, 84%),5,6,12 as well as decision-making accuracy research completed in other team ball-sports.18–20 This could be due to the research above including missed and unwarranted free kicks in their analysis process of determining umpire accuracy. As these studies utilised missed and unwarranted decisions, it could be postulated that the results would not be accurate in determining in-game free kick accuracy, as our data is more closely related to decisional accuracy. This could, in turn, be a limitation of the current study. For quarter-by-quarter intervention accuracy, the current study found no significant reductions between quarters (Table 1), despite chi-square analysis revealing a distribution significance between Q1 and Q2, Q3, and Q4. These comparisons of umpire accuracy between quarters differ from previous research, which suggests that accuracy improves as match time progresses,11,18 however aligns with research which found accuracy does not change between quarters. 12 However, it is important to note that our study utilised only correct and incorrect decisions (as discussed above) and did not include missed and/or non-given decisions. This may inflate the accuracy of the umpires in our study, however, generally a missed free-kick will not interfere with the flow of play the same as an incorrectly given free-kick will.
Despite these non-results, a significant difference between first and second half intervention accuracy was discovered, which aligns with previous research in association football. Mallo et al. (2012), reported association football referees made the greatest number of incorrect decisions during the final 15 min of match play. Mallo et al. (2012) attributed this to the accumulated physical and mental fatigue throughout match progression. Although our study did not present the same relationship between perception of fatigue and intervention accuracy it is possible that physical fatigue may affect AF field umpires’ decision-making, since the refereeing of both sports requires the officials to cover an exceptional amount of aerobic and anaerobic movement throughout an extended period of time over a large playing area. The results of Paradis et al. (2016), however, are similar to our results, as their study did not find any significant relationship between fatigue and decision-making accuracy. Paradis et al. utilised 10 × 300 m sprints as a measure of overall fatigue for sub-elite field umpires and video recording footage of free-kick decisions. 8 In spite of this, the lack of findings could be due to extensive anaerobic fatigue induced by repeated sprints not being a clear indicator of actual match performance, as they are not a specific game movement that athletes perform during match play (in this case, AF field umpiring). 29 Likewise, it is well understood that training effects are heavily limited to the muscular involvement patterns elicited from the given conditioning exercises. 30 These results could, therefore, be an inaccurate indicator of in-game accuracy, as the training method may not be able to properly illicit the fatigue experienced during a match of AF.
Indeed, we can postulate that all factors and elements of managing a game of AF, from physiological to cognitive load, are contributors to the significant reduction of intervention accuracy between halves. Further investigation is required to determine whether specific match demands (internal and external load) are the cause of the reduction of intervention accuracy.
Zone of field results showed that there was a large disparity between the mid and end zones of the ground regarding intervention distribution (Mid = 238, Offensive = 76, Defensive = 70). This is in agreement with previous research regarding the distribution of free kicks across the ground.6,12 The nonuniform distribution of free kicks throughout the three zones of the field is due to the mid zone of the ground being the most populated area, with transition plays and the resumption of play occurring in the mid zone after a goal has been scored. Previous research has shown that midfielders cover the furthest distance out of any AF player position, and as they predominately play in the mid zone, this may partially explain why there is a nonuniform distribution of umpire interventions. Regardless of the distribution of free kicks, our study, surprisingly, showed no significant difference in intervention accuracy between the zones despite the accuracy between halves of the match being different. This has been previously reported in AF umpire free-kick accuracy research, however. 12 The non-uniform distribution of free-kicks between zones could possibly be explained by the match specific actions performed by the players. The pressure/contact applied by defenders to forwards (or vice versa) in the end zones can possibly be difficult to adjudicate, coupled with an increased match intensity and congestion when a team is trying to score and/or defend. Although our study did not reveal significant differences between zonal accuracy, previous research has concluded that umpires perceive/interpret illegal match actions using different cognitive processes, 31 which may plausibly explain a lack of continuity in contentious decision-making accuracy and difference in free-kick distribution between zones. More extensive research into the difference of intervention accuracy between the zones of an AF ground should be completed, with future research investigating the specific match actions of each field zone and how they could possibly affect an umpire's accuracy.
Regarding the perception of fatigue that the umpires accumulated throughout match play, analysis displayed that there was no statistically significant difference between any questionnaire results pre- and post-match. We hypothesised that the fatigue subscale would increase from pre- to post-match, whilst the energy subscale scores would decrease. Considering the decrease in intervention accuracy as the matches went on, combined with previous readings around overall fatigue during umpiring,11,12 the lack of findings in perception of fatigue are perplexing. It is plausible that either the umpire's themselves are not perceiving fatigue correctly, or that the questionnaire may not be a suitable tool in this context. Nonetheless, this area has been heavily researched, with the aim of identifying fatigue and its effects on athletes when performing at the elite level. For instance, Mallo et al. (2012), investigated whether positioning on field effected decision-making accuracy in elite level association football referees and assistant referees. 19 It was suggested that not only does field-positioning effect decision-making accuracy of referees, but that mental fatigue (specifically) and match progression effect accuracy. Although the results of our study show that perceptual fatigue does not change pre- and post-match, our results do identify a significant decline in intervention accuracy across the duration of a match. Whether this is caused by mental fatigue, or physiological fatigue, is yet to be determined upon in AF at any level, let alone semi-professional, state-level AF.
When interpreting the results of the present study, there are certain facts that must be considered. It is important to note that this data was collected at a state-level, so it is unclear how these results will apply to different competitive levels. Notably, the research team did not analyse missed free-kicks and non-decisions in the collection of intervention accuracy due to feasibility issues when collecting that data, compared to previous studies in both AF and association football.6,12,18,19 However, while it would be ideal to analyse missed free kicks, it is certainly important to understand the accuracy of when officials are directly intervening with the flow of play. The free kicks that are given are the only interventions that affect match play and increase congestion, as well as missed/non-given decisions being incredibly debated. Future research on the match-specific intervention accuracy of field umpires should focus on identifying and analysing these instances, given their significant role in determining match outcomes. It is also important to consider that at the highest level of AF (the AFL) have adopted a fourth field umpire as of the 2023 season, aiming to reduce umpire error rates. Having four field umpires officiate state-league matches may reduce the number of incorrect free-kicks per match. However, given there are minimal studies on using four field umpires, this is something that should be investigated by future research.
Conclusion
To summarise, the results of this study are one of very few to explore the intervention accuracy of umpires during competitive state-level AF matches. It is also the first to measure and record the perception of fatigue of the umpires pre- and post-match in AF. The main effects observed from the data collection indicate an apparent trend between match progression and free-kick distribution, with intervention accuracy significantly decreasing from the first to second half of a match. Furthermore, it was discovered that field umpires do not perceive fatigue after officiating a match of AF, instead perceiving themselves to feel the same pre and post officiating. Future studies in the area of AF umpiring should examine the relationship between intervention accuracy and physiological load throughout AFL matches. Investigating as to whether there are specific match actions from players that can possibly skew the judgement of an umpire's decision is also an area that should be researched.
Footnotes
Acknowledgements
There was no outside funding for this project. The authors would like to thank the West Australian Football League field umpiring group, for their extensive help with the project, as well as Justin Orr, David Yole, Dean Margetts, and the members of the WANFLUA. Additionally, Cody Hoffmeister and Aidan Dallimore must be thanked for their invaluable help during the data collection period. There were no conflicts of interest for this project.
Data availability
The data that support the findings of this study are available upon reasonable request from the corresponding author, CW. The data are not publicly available due to the conditions of ethical approval for this study.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
APPENDICES
Visual analogue scale for fatigue for field umpires. Note: The below was adapted for an online survey
