Abstract
Short-term forecasting of performance in football is crucial in week-to-week decision making. The current study presented novel contributions regarding the considerations that should be accounted for in the prediction of match actions performed in competitive matches. First, the study examined whether the quantity and recency of training data used to build a prediction model significantly influenced predictive accuracy. Three prediction models were built with the exponential moving weighted average (EMWA) method, each differing in the quantity of training data used (three, five, and seven preceding match days). Next, the study examined if contextual constraints, such as type of match action being predicted, playing position, or player age, significantly influenced predictive accuracy. Match action data from players in the top five European leagues were collected from the 2014/2015 to the 2019/2020 seasons. The model trained using less but more recent data (three preceding match days) demonstrated the greatest accuracy. Next, within the offensive and defensive phases, match actions differed significantly in predictive accuracy. Lastly, significant differences were found in prediction accuracy between playing positions, whereby actions associated with the primary task of the playing position were more accurately predicted. These findings suggest that in the forecasting of individual match actions, practitioners should seek to train the prediction model using more recent data, instead of including as much data as possible. Furthermore, contextual constraints such as the type of action and playing position of the player must be keenly considered.
Introduction
The ability to predict the future performance of players is inherently valuable to professional football teams as they are crucial in substantiating organizational decisions on player recruitment, development, and selection. 1 Specifically, the capacity to predict the performance contributions of players in one's own team and that of the opposition in the short term (i.e. week to week) is crucial toward informing and altering team tactical strategies. 2 Presently, teams heavily utilize such a game-to-game approach in performance analysis and prediction, in order to prepare strategies that exploit the weaknesses of the opponent and constrain their strengths. 3
The widespread usage of performance prediction in the short term may be attributable to the high degree of “noise” that limits predictive accuracy when attempting to forecast performance far into the future. 4 Specifically, frequency of match actions performed differ significantly as players face teams of different strengths, 5 and even when they play at home or away venues. 6 However, despite the widespread usage of this short-term performance prediction, at present, there is limited understanding of the underlying mechanisms that influence the predictability of match actions. Particularly, there is a gap in current knowledge regarding the breadth, depth, and type of data necessary to train a prediction model that allows for greater accuracy in forecasting individual performance, as the inclusion of more data points may not necessarily relate to more accurate prediction models. 7 Critically, earlier works have highlighted that building a prediction model using all available data (risking the inclusion of potentially irrelevant data) has an adverse effect on the predictive accuracy of football performance.7,8 This has been corroborated by recent work highlighting that greater predictive accuracy of match results in a particular competition can be garnered by utilizing less, but more recent data (e.g. the seeding pots from which the teams are drawn in that same season) as opposed to using larger, but potentially more dated dataset (e.g. club ranking coefficients built from 5-year historical data). 9
At present, despite a limited understanding of how the unique characteristics of individual players affect the predictability of their match actions, professional clubs often make decisions on player selection and recruitment based on these characteristics. For example, players are valued less as they approach the age of 30, based on the assumption that they are past their physical prime and cannot contribute equally to their younger counterparts. 10 However, little is known about whether the age of a player significantly influences the predictability of match actions (e.g. is dribble success frequency more predictable for younger players?). Similarly, there is a limited understanding of how on-field characteristics such as playing position affect the predictability of match actions (e.g. is the predictability of offensive match action frequency similar for forward players compared to central-midfielders?). Without further examining these underlying mechanisms, the applicability and reliability of data analytics in informing the decision-making processes at professional clubs remain uncertain. 11
The current study therefore addresses three research questions. Specifically, in exploring the mechanisms of short-term performance prediction in football, we examine if (1) in predicting match action frequency, does the utilization of more preceding data points provide a more accurate prediction and (2) do different match actions in football differ in their predictability? We also explore the influence of individual player characteristics on performance prediction, specifically (3) if the age or playing position of the individual significantly influences the predictability of match actions. Through addressing these research questions, the current study would put forth a more comprehensive understanding of the critical factors that influence the prediction of future match performance, and therefore would need to be carefully considered in the construction of performance prediction models in the future.
Method
Data sources
Match action data of professional football players in the English Premier League, French Ligue 1, German Bundesliga, Italian Serie A, and Spanish La Liga was collected from the football data website Whoscored (www.whoscored.com). The data were collected for all players who participated in the 2014/2015 to 2019/2020 season, resulting in a dataset of 4607 unique players.
Match actions
Four offensive match actions—shots, key passes, crosses, and dribbles were selected as key offensive performance indicators as they have been highlighted as significant contributors toward team performance. 12 Conversely, tackles, interceptions, clearances, and aerial duels won were selected as key defensive performance indicators due to their significant contributory role toward successful defensive outcomes. 13 The operational definitions of the match actions used in the current study are precisely defined by Opta Sports, the company that collected the dataset (for further details, see Liu et al.). 14 Although the frequency in which these key match actions are performed by individual players may not wholly encapsulate the performance contributions of the player, the ability to perform these actions in varying match contexts are strongly associated with team performance outcomes. 15 Therefore, being able to forecast these actions more accurately will significantly inform the processes of opposition analysis, player selection, and player recruitment.
Data filtering
To ensure a valid representation of match action frequency, we utilized a filtering process built on earlier work.16,17 First, match action data of the player was excluded if they (1) were not part of the starting lineup or (2) did not play at least 60 minutes in that match. Subsequently, the player's data was included in the analysis if they participated in at least 1080 minutes in each of the six seasons after data was filtered from the first two exclusion criteria. The individual characteristics of the 187 players that remained after filtering can be found in Table 1.
Characteristics of individual players.
Note: Age categories were determined based on the player's age at the beginning of the 2014/2015 season.
Data normalization
For each match played, match action data for each player was normalized by the total minutes played to account for differences in playtime and analyzed as per 90 minutes values.
Mechanisms of short-term performance prediction
Developing models of match action frequency prediction
Based on its demonstrated success in predicting football performance,18,19 match action frequency was predicted using the exponential moving weighted average (EMWA) method. The EMWA algorithm was selected as it offers greater sensitivity to the volatility in each week-to-week match sample (e.g. variable match conditions, nonperformance) when compared to a simple moving-average model.
20
In another earlier work, the EMWA model has also outperformed other models in modeling datasets with frequent occurrences of zero values, wherein zero was a natural lower bound for the variables.
21
We evaluated three models (EMWA3, EMWA5, and EMWA7), the difference between these models being the number of preceding match days that were used as training data for predicting match action frequency in the next match instance. Specifically, to predict match action frequency in the next match day (PFk), the EMWA3 model utilized the actual match action frequency data from the past three available match days (AFk−1, AFk−2, AFk−3). Conversely, the EMWA7 utilized match action frequency data from the past seven match days (AFk−1 … AFk−7). The calculation of the smoothing factor for the EMWA, α, was defined using the span (e.g. 3, 5, or 7) and is detailed in equation (1). The EMWA models were computed using the ewm package in pandas library.
22
Statistical analysis
To compare the accuracy of the four prediction models, a predicted frequency was generated for each of the eight match actions using data from preceding matches. The accuracy in prediction (PA) was computed as the root mean square error (RMSE) between the predicted frequency and actual performed frequency of the match action in that particular match (e.g. RMSE between PFk and AFk of shots taken). For all three EMWA models and the BL model, match action frequency was predicted from the eighth match instance (eighth match played by the player) to the last match instance. This was because the EMWA7 model requires match action data from seven previous match instances before match action frequency can be predicted. A 4 (prediction model) ˟ 4 (match actions) two-way analysis of covariance (ANCOVA) was performed twice (once for offensive match actions, and once for defensive match actions), to assess any statistically significant differences in predictive accuracy across the prediction models and across match actions. To control for the variation in raw frequency of the actions performed, the raw frequency of actions performed normalized to per 90 minute values was added as a covariate.
Influence of player characteristics
Classifying age categories and playing position
Players were categorized into three age categories based on their age when they played their first match in the dataset. The birthdates of the players were collected from the Whoscored website. The players were categorized into the (1) Development group if they were 21 years or younger in the 2014 season; the (2) Prime group if they were 22 to 26 years old in the 2014 season; and the (3) Veteran group if they were 27 years or older in the 2014 season. These age bounds were selected with reference to previous studies that have divided and compared players into different age groups.1,23 However, slight modifications were made to the age bounds utilized in earlier studies as the current study observed performance on a longitudinal timeframe instead of a cross-sectional one.
In addition, the players were classified into five major playing positions based on the playing position they most frequently played in, as listed in the Whoscored dataset. These major playing positions were (1) central-defender (defender-central); (2) wide-defender (defender-left, defender-right); (3) central-midfielder (midfielder-central, defensive-midfielder, attacking-midfielder); (4) wide-midfielder (midfielder-left, midfielder-right, defensive-midfielder-left, defensive-midfielder-right, attacking-midfielder-left, attacking-midfielder-right); and (5) forward (forward, forward-left, forward-right).
Statistical analysis
To assess the influence of player characteristics on the PA of match actions, a 3 (age category) ˟ 5 (playing position) two-way ANCOVA was performed twice (once for offensive match actions and once for defensive match actions). To limit the influence of predictive error caused by the type of prediction model used, PA data from the best performing prediction model highlighted in RQ1 was used for the statistical analysis. To control for the variation in raw frequency of the actions performed, the raw frequency of actions performed normalized to per 90 minute values was added as a covariate.
Results
Mechanisms of short-term performance prediction
The two-way ANCOVA performed on the predictive accuracy of offensive match actions while adjusting for raw frequency of actions performed showed a significant main effect of prediction model used (F(3, 2975) = 180.971, p < 0.001) and match action being predicted (F(3, 2975) = 161.006, p < 0.001). The two-way interaction of prediction model and match action being predicted was also significant (F(3, 2975) = 17.446, p < 0.01). Similarly, the two-way ANCOVA performed on the predictive accuracy of defensive match actions while adjusting for raw frequency of actions performed showed a significant main effect of prediction model used (F(3, 2975) = 526.604, p < 0.001) and match action being predicted (F(3, 2975) = 112.096, p < 0.001). There was a significant two-way interaction between prediction model and match action being predicted (F(3, 2975) = 12.499, p < 0.001). As detailed in the previous section, predictive accuracy is reported as the marginal means of error in predicted versus actual frequency of the match action performed (RMSE) when normalized to 90-minute intervals (controlling for raw frequency of the actions performed normalized to 90 minutes).
Tukey's post hoc tests revealed significantly greater predictive accuracy of the EMWA3 model compared to the BL model for offensive match actions (p < 0.001). The EMWA3 model also demonstrated significantly greater accuracy in comparison to both the EMWA5 and EMWA7 models (both p < 0.001). With regards to the match actions, the predictive accuracy of crossing frequency was significantly worse compared to that of dribbles, key passes, and shots (all p < 0.001). The predictive accuracy of shot frequency was found to be significantly more accurate compared to that of dribbles and key passes (both p < 0.001). Lastly, the predictive accuracy of key passes frequency was significantly greater than that of dribbles (p < 0.05). The accuracy of the different prediction models in forecasting offensive and defensive actions can be found in Table 2. Additionally, the accuracy of the best performing model identified in RQ1 (EMWA3) in forecasting offensive and defensive actions can be found in Table 3.
Accuracy of models in predicting offensive and defensive actions.
EMWA, exponential moving weighted average; RMSE, root mean square error.
Accuracy of EMWA3 model in predicting offensive and defensive actions.
EMWA, exponential moving weighted average; RMSE, root mean square error.
For defensive match actions, Tukey's post hoc tests also revealed significantly greater predictive accuracy of the EMWA3 model compared to the BL model (p < 0.001). The EMWA3 model also demonstrated significantly greater accuracy in comparison to both the EMWA5 and EMWA7 models (both p < 0.001). With regards to the match actions, the accuracy in predicting frequency of aerial duels won was significantly worse compared to that of interceptions and tackles (both p < 0.001) but significantly greater than that of clearances (p < 0.001). The predictive accuracy of clearance frequency was significantly worse than that of interceptions and tackles (both p < 0.001). Lastly, the predictive accuracy of tackle frequency was significantly greater compared to that of interceptions (p < 0.01).
Influence of player characteristics
The two-way ANCOVA performed on the predictive accuracy of offensive match actions using data from the EMWA3 model showed a significant main effect of playing position (F(4, 732) = 16.454, p < 0.001). There was no significant main effect of age category, and no significant two-way interaction effect of playing position and age category. Tukey's post hoc tests revealed a significantly greater prediction accuracy of offensive match actions performed by forward players (0.484, 95% CI [0.431, 0.538]) compared to central-midfielders (0.653, 95% CI [0.612, 0.694]), central-defenders (0.818, 95% CI [0.760, 0.877]), and wide-defenders (0.696, 95% CI [0.636, 0.756]) (all p < 0.001). The accuracy in predicting frequency of offensive match action performed by central-defenders was found to be significantly less accurate compared to that of wide defenders (p < 0.05), central-midfielders (p < 0.001), and wide-midfielders (0.626, 95% CI [0.530, 0.721]) (p <0.05). The raw frequency of the match actions and error in prediction generated by the EMWA3 model is detailed in Tables 4 and 5.
Root mean square error (RMSE) and raw frequency of offensive match actions.
Root mean square error (RMSE) and raw frequency of defensive match actions.
Next, the two-way ANCOVA performed on the predictive accuracy of defensive match actions using data from the EMWA3 model showed a significant main effect of playing position (F(4, 732) = 8.163, p < 0.001). There was no significant main effect of age category, and no significant two-way interaction effect of playing position and age category. Tukey's post hoc tests revealed a significantly greater prediction accuracy of defensive match actions for central-defenders (0.809, 95% CI [0.765, 0.852]) compared to forward players (0.950, 95% CI [0.908, 0.991]) (p < 0.001). The accuracy in predicting frequency of defensive match actions performed by forward players was significantly worse compared to that of central-midfielders (0.838, 95% CI [0.807, 0.868]) and wide-defenders (0.782, 95% CI [0.737, 0.826]) (p < 0.001).
Discussion
The objective of the present study was to examine several mechanisms that are crucial in the short-term prediction of key match action frequency performed by players in competitive matches. Specifically, we examined if prediction accuracy was significantly influenced by the amount of preceding data used to train the prediction model, and the type of match action being predicted. Lastly, we also assessed if individual characteristics of the player such as playing position and age, significantly influenced the predictability of match actions.
Mechanisms of short-term performance prediction
The results of the current study suggest that a EWMA model utilizing data from three preceding match days outperforms those constructed using data from five or seven preceding match days, as well as a linear regression-based baseline model. To the best of our knowledge, this is the first study that examines the predictability of football match action frequency in the immediate future using data from preceding matches. Our results suggest that in predicting the frequency of crucial match actions performed in the immediate future (i.e. in the next match day), the player's form, or most recent trend of performance, may be a crucial predictor. Conversely, including the performance data of the player from a month, or 2 months ago (approximately five and seven match days prior), diminishes the accuracy of the prediction model.
These results are in line with earlier work proposing that when constructing a performance prediction model in football, it is crucial to build the model using data that effectively optimizes predictive accuracy for the given context, rather than using all the data that is available. 7 In that respect, training a model with more data does not always increase predictive accuracy, which is reflected in the results of the current study.
This may be particularly applicable when using time-series data, as a larger dataset introduces more random variables that may influence performance. For example, unpredictable variables such as accumulated fatigue due to congested schedules, 24 or changes in tactical playing style in response to certain situations, 5 can significantly influence the match actions performed by players. Particularly, professional teams engage in a cyclical process of adjusting their tactics to exploit the weaknesses of upcoming opposition, and to capitalize on their own strengths based on information gained from previous weeks.2,25 Therefore, greater predictive accuracy demonstrated by the EMWA model built using a smaller, but more recent dataset, may be attributed to a decreased degree of potential randomness being introduced to the prediction model. Furthermore, given the significant association between player confidence and player performance highlighted in earlier work, 26 it is understandable that performance in the immediate future (i.e. subsequent week) is more strongly associated with the most recent level of performance displayed.
In a practical sense, these findings may be consequential to player selection strategies of international football team managers. Specifically, given the congested schedule of modern football, managers do not have many opportunities to trial different players before the actual tournament. Consequently, it may be a viable strategy to select players that have been “in-form” in recent domestic matches before the actual international tournament, as the current study suggests that the performance of players in recent matches are predictive of their performance in the immediate future. Conversely, our results suggest that practitioners involved in player scouting and recruitment processes should be cognizant that the likely future performance of a player is more closely related to recent performance. Therefore, such practitioners should be cautious when making recommendations or decisions informed by player performance data that spans over an extensive period.
The results of the current study also suggest that within the offensive and defensive phases, there are significant differences in the predictive accuracy of match action frequency (i.e. accuracy in prediction differs significantly across offensive match actions, similarly, accuracy in prediction differs significantly across defensive match actions). These results are expected, as the frequency and SD of these match actions performed by players in matches are nonhomogeneous, even when comparing between actions performed in the same phase of the match (Tables 4 and 5)—with the general trend being that match actions occurring more frequently have greater SDs. Furthermore, we observe a positive association between the raw frequency of the match action performed and the error in prediction generated by the best performing EMWA3 model (Figure 1). These findings are in line with an earlier work highlighting that greater variability in sport performance was significantly associated with poorer predictability of eventual performance outcomes. 27

Relationship between the raw frequency and error in predicted frequency of match actions (for the EMWA3 model) normalized to 90-minute values.
However, the results of the current study indicate that for certain actions, there is greater consistency in the frequency in which they are performed. It may be possible that the actions that are most accurately predicted (e.g. shots, key passes, or tackles) are more resistant to the novel constraints acting on the players from week to week, such as the opposition or match location. In other words, the nature (or perhaps, importance) of these actions may drive players to find solutions to the task and environmental constraints presented in the match environment, in order to successfully execute these actions. For example, when playing against a highly defensive team, a forward may transition toward shooting from outside the box or having shots on goal from headers to increase their shot frequency. Conversely, the match actions which may be more contingent (i.e. less resistant) to the match conditions that vary week to week may therefore be less reliably predicted. For instance, the frequency of aerial duels is highly contingent on the number of long aerial passes played by the opposition, which may therefore explain why the predictive accuracy of aerial duel frequency is lowest.
Influence of player characteristics
The results of the current study suggest that the individual's playing position significantly influences the prediction accuracy of week-to-week frequency in match actions performed. In general, our results indicate that the frequency of defensive actions performed by primarily defensive players can be more accurately predicted compared to players in primarily offensive roles (e.g. central-defenders and wide-defenders compared to forwards). Conversely, the frequency of offensive actions is more accurately predicted for players in offensive roles compared to those in defensive ones (e.g. forwards compared to wide-defenders and central-defenders).
The direction of our findings, while preliminary, suggests that greater accuracy in prediction can be attained in the forecasting of match actions closely associated with the primary tasks of the individual player. Conversely, match actions associated with secondary roles (e.g. defensive duties demanded of forward players), may be highly contingent on the varying task and environmental constraints across matches, such as quality of opposition, match location, or consequently, match status. 28 The diminished accuracy in predicting match actions that are secondary to the individual's primary playing position possibly suggests that practitioners involved in player recruitment or player selection should be cautious about placing too much emphasis on the potential offensive contributions of defensive players and vice versa.
The results of our current study also highlight that there was no significant effect of age on the predictive accuracy of defensive and offensive match action frequencies. This suggests that across different age groups, there were no significant differences in the consistency of match actions performed by professional players from week to week. The results of the current study are interesting as existing studies examining the relationship between age and performance in elite professional football players have highlighted that physical performance generally decreases with age.23,29 However, the nonsignificant difference in predictability of defensive and offensive match action frequency across age groups highlighted in the present study raises the possibility that professional players in different age categories may be able to compensate with different competencies (e.g. development players with greater physicality, veteran players with greater match experience) to maintain consistency in their performances. However, it is important to bear in mind that the players included in the analysis were active players competing regularly at the highest level of competition, therefore, the findings may not be extrapolated to different populations (e.g, different leagues or tiers of competition).
Limitations
As highlighted in the Introduction section, there are several external factors (e.g. strength of opposition, playing style of opposition, or match location) that may significantly influence the frequency, and consequently, the predictability of match actions performed. Furthermore, considering that the match schedules of players largely differ depending on which competitions their teams compete and progress in (e.g. domestic cups, national team competitions, or European competitions such as the Champions League and Europa League), it is likely that the distribution of match days throughout the season are not uniform across the players analyzed in the study. Therefore, the generalizability of the results of the current study is subject to the limitation that not all predictor factors that may be possibly influential were accounted for in the construction and assessment of our prediction models. However, the predictor factors analyzed in the current study were consciously limited to avoid the possibility of overfitting our prediction models (by including what is possibly an inexhaustible list of influential factors), 30 and to derive results that are largely more generalizable and more pragmatic to practitioners.
Conclusion
The results of the current study put forward several novel contributions toward existing literature and practice in forecasting individual match actions in football. First, our results suggest that in predicting performance in the subsequent week (specifically match action frequency), prediction accuracy is enhanced with an EWMA prediction model that includes less, but more recent data. We also highlight that within the offensive and defensive phases, there are significant differences in the predictive accuracy of match actions, which suggests that actions cannot be forecasted equally. Furthermore, the prediction accuracy of match action frequency also differs between players in different playing positions. Particularly, the frequency of match actions that are closely associated to the primary task of the player's playing position can be more accurately predicted.
Footnotes
Acknowledgements
The authors would like to thank Whoscored.com for their kind cooperation in allowing us to collect and utilize data from their website. The authors’ responsibilities were as follows—QH, JK, and KYH designed the research and wrote the manuscript; QH and JK collected and analyzed the data; and QH, JK, and KYH edited the manuscript. All authors read and approved the final manuscript submitted.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
