Sage Journals: Discover world-class research

Abstract

We propose a Machine Learning model to predict handball games and derive insightful information for sport coaches. Our model, augmented with statistical features, outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We compare different models, evaluate them to assess their predictive capabilities and show that our statistical variables, estimating the strengths of the teams, appear as the most important features to the selected model. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams’ coaches by providing statistical and predictive insights to prepare future competitions.

Keywords

handball predictions machine learning feature extraction explainable AI

Introduction

Handball is a popular sport in Europe with growing interest in Northern Africa and South America. As a fast-paced sport, it is gaining interest in the population and in the scientific literature, yet predictive models are rarely discussed Saavedra (2018). In this paper, we intend to fill this gap.

The history of handball

With its primitive form going back to the ancient Greece, modern handball was considered to be created by German sports teachers (outdoors with 11-aside players) around 1890 while Scandinavian countries (Denmark and Sweden) introduced a version with 7-aside players around the same period (Hahn et al., 2013). Its original Danish name “Haandbold” was first called in 1898 and the first official competition was organized in 1917 when the term “handball” was also officially used for the first time. It became an Olympic discipline at the 1972 Olympics in Munich for men and at the 1976 Olympics in Montreal for women (Olympics, 2023).

Literature review and related work

Sports predictions is an active field of research mostly focusing on sports such as football and basketball due to a larger amount of data publicly available. To predict the outcome of a match, several algorithms are considered to model sports matches (Ley and Dominicy, 2023) such as linear regressions (Miljkovic et al., 2010; Rodriguez-Ruiz et al., 2011), support vector machines (Cai et al., 2019), random forests (Groll et al., 2019), XGBoost (Lampis et al., 2023) or neural networks (McCabe and Trevathan, 2008; Huang and Chang, 2010). In the field of football predictions, Groll et al. (2019) use a so-called “Hybrid Random Forest” to predict the outcome of football matches. They augment their data set by adding a feature corresponding to the strength of a team, estimated by maximum likelihood. This value is obtained by modeling scored goals with a bivariate Poisson distribution. Denoting the random variable $Y_{i j m}$ for the event “number of goals scored by team $i$ against team $j$ ( $i, j \in 1, \dots, n$ ) in match $m$ ( $m \in 1, \dots, M$ )”, they define the joint distribution of scored goals for home and away teams as

\begin{aligned} P (Y_{i j m} = z, Y_{j i m} = y) = \frac{λ_{i j m}^{z} λ_{j i m}^{y}}{z! y!} e^{- (λ_{i j m} + λ_{j i m} + λ_{C})} \\ \cdot \sum_{k = 0}^{min (z, y)} (\binom{z}{k}) (\binom{y}{k}) k! {(\frac{λ_{C}}{λ_{i j m} λ_{j i m}})}^{k} . \end{aligned}

(1)

They assume that the form of the estimated parameter is

\log (λ_{i j m}) = β_{0} + (r_{i} - r_{j}) + h \cdot 1_{team i playing at home}

. The parameter

β_{0} \in R

corresponds to a common intercept and

h \in R

multiplied by the indicator function corresponds to the effect of playing at home. The values of interest are then

r_{i} \in R

and

r_{j} \in R

as the strength parameters of the home team

i

and away team

j

. All presented parameters are estimated via Maximum Likelihood based on the past

M

matches using a time decaying parameter to weigh the recency of the matches. Consequently, each team

i \in {1, \dots, n}

will be assigned such an estimated strength parameter, also known as “ability”, and this new feature will be added to the data set to enhance the future learning algorithm.

Only scarce literature covers the field of handball analytics (Saavedra, 2018). Most of the existing research works are medical analyses looking at body fatigue and injury (Akyüz et al., 2019; Seil et al., 1998; Camacho-Cardenosa et al., 2018) with a particular interest on young players (Madsen et al., 2019; Grabara, 2018; Fonseca et al., 2019). Wagner et al. (2014) propose a review of performance for handball players and teams, highlighting the importance of factors such as experience level, age or playing positions. Pic (2018) showed that the impact of the home effect can play a role in critical moments of a game. In particular, when the score indicates a draw, the home team is more likely to win the game. As highlighted in Pic (2018), this advantage should be taken with care as it can either come from the effect of playing at home (with more supporters cheering) or from the fatigue of the away team from the travel. Indeed, unlike sports such as football where players are mostly travelling by plane or train, most handball clubs travel by bus which can have an important impact on the players’ fatigue during the competition. Therefore, the difference in distance traveled between both teams can explain the level of players’ freshness.

With a focus on prediction of handball outcomes, Groll et al. (2020) compared different regression approaches to model international handball games. Given the level of under-dispersion (the variance is lower than the mean), they discarded the regular Poisson distribution and opted for a Gaussian distribution with low variance. In a similar spirit, Felice (2024) proposed to model the number of goals scored by a team with a Conway-Maxwell-Poisson distribution (Sellers, 2022) and to derive a strength parameter from the parameters of the fitted distribution.

Contribution of this paper

In this paper, we will build upon the latter idea and integrate these strengths parameters into the features that we use for handball match prediction. In Section ‘Materials and methods’, we will describe the construction of our training data set based on publicly available data and define the evaluation metrics for classification and regression models. Next, we will present the results of the trained models in Section ‘Results’ and show that adding statistical features helps improve the predictive performance of the different ML models we explore. This section will also present how we can extract informative sports insights from a ML model via an explainable ML framework. Section ‘Discussion’ discusses the potential extensions of this work that can also serve as user-friendly tool to team coaches in view of preparing upcoming games and competitions. Finally, we conclude in Section ‘Conclusions’.

Materials and methods

In this section we present the data used to train our machine learning models with the associated features.

Data set

Our data set consists of two data sources that include historical games and team squads information. We extract information of past games using the SportScore API from the service RapidAPI. The extracted data include information such as game location, time, competition and score. The second source used to complete the data set contains information about teams and players. The data are extracted from the website www.handball-base.com and will help us generate team and player specific variables.

Target response

We will consider the prediction of match outcomes from two perspectives. On the one hand, as a two-target regression model, when our objective is to predict the number of goals scored by each team by the end of a match, allowing us to deduce the outcome. On the other hand, as a classification model when we aim to directly predict the outcome, namely home win, draw or away win.

Features

Our data set is composed of features which bring different levels of information about the two competing teams. The exhaustive list of features with abbreviations is available in Appendix A.1.

Game information

These features aim to carry information about the game and its importance. It can help encapsulate information such as potential stress for players and their state of mind (for instance, how seriously they may take a game).

Day of week: encoded day of the week for the start time of the game.

Hour: hour of the start time of the game. We can expect that games starting early in the day (e.g., morning) can be less important or players may lack time for preparation.

Importance: carries the importance of the competition from the lowest (friendly games with value 3) to the highest importance (Champions League with value 1).

Days until final: counts the number of days until the competition’s final or end of season. Combined with the variable Importance, this should account for the intensity of the game (last day of competition for the championship, e.g. Champions League’s final, will be more important than the first day of the competition).

Team’s structure

We also consider features allowing us to capture information from the team’s physical abilities and experience, and we incorporate them as differences between the home and away teams. There is one feature for each attribute (height, weight, age) and position on the field (wings, back players, line players/pivots and goalkeepers). They all act as proxy variables for players’ experience (age) or physical superiority (height and weight).

Height: difference of the average height of players per position between home and away teams. This aims to measure the difference in physical characteristics between players (e.g., taller back players for one team may result in an advantage both in attack and in defense).

Weight: difference of the average weight of players per position between home and away teams. Similar to Height, this aims to capture the differences in physical abilities between players.

Age: difference of the average age of players per position between home and away teams. This aims to capture the difference in experience/maturity between teams. Positive values will tend to indicate more experienced players in the home team who could better handle the pressure from games with high importance and have a positive impact on their team’s result.

Other features give us information about the team’s structure such as the distance to travel or the team’s composition.

Travel distance: distance in kilometers (as the crow flies) to travel for the away team between the club’s location and the address of the home team. This aims to capture the potential fatigue caused by the travelling distance.

Nationalities: ratio between the total count of players’ nationalities in a team and the total amount of players. This aims to capture the affinity between players as well as potential language barriers.

International: ratio of players selected in their national team.

Teams’ strengths

We finally add features that correspond to the teams’ strengths as described in more details in Section ‘Estimating team strengths’.

Attack strength: estimated strength in attack via Maximum Likelihood for home and away teams.

Defense strength: estimated strength in defense via Maximum Likelihood for home and away teams.

Data preparation

To prepare the data for training our models, some preprocessing steps were required.

Pulling data from different sources

In addition to the main data extracted from the sources presented in Section ‘Data set’, we manually collected few more information. The dates of competitions that generate the feature Days until final were fetched manually from official IHF and EHF websites. The feature Importance was created manually (see Appendix A.2) using knowledge from professional handball experts.

Correction of errors

From the collected data, we performed several manual data investigations to remove errors from the data. The main sources of errors are typos coming from humans when reporting values. These were mostly on birth dates and height reported on players’ profile. We thus looked at identifying impossible values (such as players born in 2995, or players taller than 2.5 m, etc) to be removed and flagged as missing.

Filling missing data

The data to be used in the model extracted from the sources presented in Section ‘Data set’ are most often complete but may sometimes present missing values. Therefore, any missing data points (either after removing errors or missing from the source) had to be manually added. A long process of manual searches and data collection was performed to update players’ profile. This represented approximately 400 players out of a total of 12,000 available in the database.

Computing transformed variables

The last step of the data preparation is to compute the custom features from the raw information we collected and cleaned. Features such as Height, Weight and Age were computed within a SQL query to compute the difference of the average values per position between the home and away teams. Similarly, features such as Nationalities and International were also computed within the SQL query by counting the number of countries (or national players) divided by the total number of players in the team. Finally, some more advanced features are computed in Python. The variable Travel distance uses the haversine formula to compute the distance to travel for each team to the match location (home team coordinate for club matches). The feature Days until final counts the number of days between the day of the match and the final match of the tournament (or the season for clubs). The variables Attack strength and Defense strength use the methodology presented in Section ‘Estimating team strengths’.

Estimating team strengths

The strength of a team is an undeniably important factor of a handball match but it is not directly measurable and only remains an abstract concept. We can palliate this shortcoming by devising a statistical model that incorporates parameters which are meant to represent the attacking and defensive strengths of each team, and then estimate these parameters on the basis of preceding matches. To this end, we consider the recent history (from the past 6 months) of each team’s matches and fit the distribution of scored goals with an appropriate probability law. Felice (2024) explained that the Conway-Maxwell-Poisson distribution (Sellers, 2022) is a very good choice for this purpose as it not only satisfies the discrete nature of goal counts but also handles the problem of over- and under-dispersion one may have to deal with. Hence it is a better choice than the often used Normal (not discrete), Poisson (assumes equi-dispersion) and Negative Binomial (cannot handle under-dispersion) distributions, for instance.

The Conway-Maxwell-Poisson distribution possesses two parameters, $λ > 0$ and $ν \geq 0$ (note that $ν = 0$ implies that $λ \in (0, 1)$ ) and its probability mass function (pmf) is defined as

P (X = x | λ, ν) = \frac{λ^{x}}{(x!)^{ν}} \frac{1}{\sum_{j = 0}^{\infty} \frac{λ^{j}}{(j!)^{ν}}} .

(2)

The parameter

λ

can be assimilated with the empirical mean (depending on the values of

ν

). For instance, when

ν = 1

we retrieve the Poisson distribution for which

λ

corresponds to both the mean and variance. The parameter

ν

corresponds to the level of dispersion. The distribution can therefore handle both over- (when

ν < 1

) and under-dispersion (when

ν > 1

) scenarios. An illustration of the fitted distribution on historical data is provided in Figure 1 using the history of Metz Handball games over the course of the season 2022/2023.

Figure 1.

Histogram of goals scored by Metz Handball during the season 2022/2023 versus fitted Conway-Maxwell-Poisson (CMP) distribution. The value of the Akaike Information Criteria (AIC) is 258.72, which is the lowest among all tested distributions (CMP, Gaussian, Negative Binomial) as reported in Felice (2024).

Based on the nature of the two parameters, we use the work in Felice (2024) to define the strength of a team. Thus, we consider that the distribution of goals scored follows a CMP distribution with parameters $λ_{a}$ and $ν_{a}$ ( $a$ denoting attack), and the distribution of conceded goals follows a CMP distribution with parameters $λ_{d}$ and $ν_{d}$ ( $d$ denoting defence). The parameters are estimated via maximum likelihood and $λ_{a}, ν_{a}$ represent the attacking parameters of a team, and $λ_{d}, ν_{d}$ the defensive parameters of the same team. We define the strengths of a team as

s_{a} = {(\frac{\log (λ_{a})}{ν_{a}})}^{ω} and s_{d} = {(\frac{ν_{d}}{\log (λ_{d})})}^{ω},

(3)

where the parameter

ω = \frac{1}{n} \sum_{i = 1}^{n} w_{i} \in [0, 1]

corresponds to the average difficulty of the

n \in N

matches played over a fixed period of time. The number of matches

n

can therefore slightly differ from one team to another. The difficulty of a match is based on the European Handball Federation’s (EHF) place distribution of leagues and ranges between 0 (the least competitive) to 1 (EHF champions league).

The rationale behind these choices of $s_{a}$ and $s_{d}$ is as follows. A strong attack demonstrates a high average number of scored goals ( $λ_{a}$ ) and a balanced dispersion ( $ν_{a}$ , stable with occasional spikes). On the other hand, a strong defense corresponds to a low average of conceded goals ( $λ_{d}$ ) and a stable defense over matches (low variance translating into under-dispersion with high value for $ν_{d}$ ). These are penalized by the average difficulty, such that a team mostly playing in a poorly competitive league should see its parameters discounted compared to a similarly performing team which plays against strong competitors (e.g. in the champions league).

We thus model historical matches with the Conway-Maxwell-Poisson distribution. We estimate, by means of maximum likelihood estimation for each team, the defense parameters ( $λ_{d}$ and $ν_{d}$ ) as well as the attack parameters ( $λ_{a}$ and $ν_{a}$ ). For each match in our data set, we estimate the parameters as of the week before the match of interest, and for a time window of 6 months.¹ This consequently gives us the estimated strength parameters $s_{a}$ and $s_{d}$ that will constitute the statistical variables presented in Section ‘Features’.

Prediction models

To model the outcome of a handball match, we consider both classification and regression models to either predict the winner of the game or the scores of the competing teams. The results of the experimented models for classification and regression are discussed in Section ‘Results’.

Classification models

To predict the outcome of a match (as win, draw or loss), we train different ML classification algorithms. A first approach is based on Random Forests (Breiman, 2001). Another model is based on the popular XGBoost algorithm (Chen and Guestrin, 2016). We also use an improved version of the boosting model, CatBoost (Prokhorenkova et al., 2018), which is specialized in handling categorical data. Finally, we train a Multi-Layered Perceptron (Rosenblatt, 1958).

Regression models

As insightful alternative, we also consider regression models to predict the score of each team during a match, from which we can of course derive the predicted outcome. To that end, we use multi-target variants of the aforementioned models which will predict the final score of the home and away teams.

We note that, when training a Random Forest, it by default uses a univariate splitting criterion Breiman (2001). There is however no theoretical constraint to extent it to a multi-dimensional impurity in the case of multi-target regressions Alakus et al. (2023). The constraints instead come from implementations that might not be available in Python. We thus use Python’s modules to implement these models.² The module is a wrapper class of the model that, under the hood, simply trains two models in parallel: one for the home team and one for the away team. Similarly, Gradient Boosted models do not have any theoretical restrictions for handling scoring functions for multiple outputs Iosipoi and Vakhrushev (2024) but practical implementations may not offer such options. This is the case for the implementation of XGBoost which does not fully support multi-target settings. Therefore, we also use the same Python module as an alternative to achieve our goal. The other models, CatBoost and Multi-Layered Perceptron, can handle multiple outcome predictions by nature and have built-in implementations.

We also note that the scope of our regression problem is about predicting the score of both teams. Some work has already been done on predicting the score difference to overcome the problem of non-equidispersion Karlis et al. (2024). This approach would, in our context, lack information and only show a limited picture to sport experts. Indeed, as end users of our tool, coaches and staff members not only need to know the score difference but also the scale, in order to understand how many goals will be scored and conceded by each team.

Performance metrics

To evaluate the performance of our models, we first define the metrics we use for classification and regression models. In this section, we consider a predictive model $f$ which predicts an outcome $\hat{y}$ based on some input matrix $x \in R^{n \times p}$ with $p$ predictors and $n$ records (matches). The actual outcome is denoted by $y$ . In the case of regression, for predicting the number of goals scored by one team, the outcome is defined on the real line, $y, \hat{y} \in R$ . In the case of classification, $y$ and $\hat{y}$ are defined on ${0, 1}^{r}$ with $r = 3$ classes, namely win, draw or loss.

Metrics for classification models

To measure the performance of our classification models, we use three common metrics in the field of sports predictions: accuracy, $F_{1}$ -score and Brier score (Brier, 1950).

The accuracy corresponds to the percentage of predicted matches $\hat{y}$ that are correctly classified:

a c c = \frac{1}{n} \sum_{i = 1}^{n} 1_{y_{i} = {\hat{y}}_{i}},

(4)

Out of the

n

predicted matches, it computes the share of correctly predicted matches.

The $F_{1}$ -score is the harmonic mean between the precision (share of correctly predicted won matches among all matches predicted as won) and recall (share of correctly predicted won matches among all matches being won). It can also be written as the ratio between the true positives ( $T P$ ) from the confusion matrix with the false positives ( $F P$ ) and false negatives ( $F N$ ). It is computed as

F_{1} = \frac{2 T P}{2 T P + F P + F N} .

(5)

Another popular metric in sports analytics is the Brier score (Wheatcroft, 2021) which looks at the difference between the predicted probability of a class and its actual realization. For a match

i \in {1, \dots, n}

, a model

f

predicts the probability

{\hat{y}}_{i j}

for the class

j \in {1, \dots r}

. The actual outcome of a match

y_{i} \in {0, 1}^{r}

is a vector of

r = 3

dimensions for the realization of each possible match outcome (win, draw or loss), therefore

y_{i j}

is binary. The Brier score is defined as

B S = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{r} ({\hat{y}}_{i j} - y_{i j})^{2} .

(6)

For each of the

r

possible classes, it computes the squared difference between the predicted probability

{\hat{y}}_{i j}

for match

i

and possible class

j

to the actual class of outcome

y_{i j}

, being binary.

Metrics for regression models

To assess the quality of our regression models for each of the $k \in {home, away}$ teams, we use the Root Mean Squared Error (RMSE) and the Mean Absolute Percentage Error (MAPE).

Given the predicted score ${\hat{y}}_{i k}$ for match $i$ of team $k$ and the actual score $y_{i k} \in R$ , we compute the Root Mean Squared Error for team $k$ as

R M S E_{k} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i k} - {\hat{y}}_{i k})^{2}} .

(7)

It computes the average deviation of our model’s predictions from the actual scores for each of the home and away teams.

The Mean Absolute Percentage Error for team $k$ is defined as

M A P E_{k} = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i k} - {\hat{y}}_{i k}}{y_{i k}} | .

(8)

It computes the relative difference between the actual and predicted values and uses the absolute value to put the same weight to positive and negative deviations.

Results

To evaluate the performance of our distinct approaches, we train the different models on several years of female club matches. Our training set spans from September 2019 until April 2023 (representing 3,260 games) and leaves matches from April to June 2023 (250 games) as the test set.

In both the classification and regression settings, we train the four different models presented in Section ‘Prediction models’ where we compare the scenarios with and without the strengths features introduced in Section ‘Estimating team strengths’. The results are summarized in Section ‘Model performances’. After that comparison, we further investigate the best performing model with explainability frameworks for global and local explanations (Section ‘Model explainability’).

Model performances

In the first case of match classification, we train our four models and report the classification metrics evaluated on the test set in Table 1. We can see from Table 1 that the Random Forest with strengths features performs the best. Furthermore, we observe that adding statistical features to our models is always beneficial and, with no exception, strongly helps improve our metrics. The performance improvement is particularly remarkable for the Random Forest model which has the highest gap between the two scenarios. Although its performance with classical covariates already achieves 60.11%, the strengths features boost the performance to reach 81.32%. As a point of comparison, Groll et al. (2020) report a minimum classification error of 23.35% with an under-dispersed Poisson regression model.

Table 1.

Classification models performance comparison based on accuracy, weighted $F_{1}$ -score and Brier score. Each model is considered once only based on classical covariates and once with the additional strengths variables.

Model	Features	Accuracy	$F_{1}$ -score	Brier-score
Random Forest	Classical	60.11%	57.64%	0.4837
	Classical + Strengths	81.32%	79.15%	0.3145
XGBoost	Classical	57.51%	55.87%	0.6189
	Classical + Strengths	73.57%	71.06%	0.5784
CatBoost	Classical	58.29%	54.82%	0.5181
	Classical + Strengths	79.57%	77.04%	0.3517
Neural Net	Classical	54.18%	52.75%	0.5371
	Classical + Strengths	68.08%	66.67%	0.4479

Turning our attention to regression settings, we train multi-target regression models with and without strengths features and report the resulting performance metrics in Table 2. Our metrics of interest here are the Root Mean Squared Error and Mean Absolute Percentage Error for both home and away teams.

Table 2.

Regression models performance comparison based on root mean squared error and mean absolute percentage error. Each model is considered once only based on classical covariates and once with the additional strengths variables. Note that we separate the predictions for home and away teams.

		Home		Away
Model	Features	RMSE	MAPE	RMSE	MAPE
Random Forest	Classical	5.05	14.91%	4.85	15.34%
	Classical + Strengths	3.96	11.50%	3.79	12.28%
XGBoost	Classical	5.87	15.40%	5.16	14.08%
	Classical + Strengths	4.24	11.63%	4.09	11.45%
CatBoost	Classical	5.13	14.86%	4.78	15.17%
	Classical + Strengths	3.79	10.94%	3.73	12.06%
Neural Net	Classical	5.29	15.76%	5.07	16.14%
	Classical + Strengths	5.28	15.31%	4.96	15.30%

We can see from Table 2 that, although the Random Forest with strengths achieves good performance levels, the CatBoost model can predict match scores with the least error. Similar to our classification use case, adding statistical features benefits greatly to all trained models, with the exception of the Neural Net for home team.³ This outcome suggests that our best model can accurately predict the outcome of a female handball game with an error of 3.8 goals (i.e. 11% considering 27.9 goals per match on average) for the home and away teams. As a comparison, state-of-the-art predictive models for football can achieve a prediction error of 1.194 goals (Groll et al., 2019). Considering that the average number of goals scored during a game is 1.5 (Zebari et al., 2021), this corresponds to a 80% error. This highlights the reliability of our predictive models strengthened by the supplementary information carried by the statistical covariates.

Model explainability

Many ML models such as Random Forests or Neural Networks are considered as black boxes: they are excellent in terms of prediction accuracy, but one cannot understand the factors that lead to a given prediction. Therefore, explainability (model transparency) is an important capability any ML model should have. Its importance will become even stronger with forthcoming regulations on Artificial Intelligence Hamon et al. (2020); Sovrano et al. (2022). Furthermore, explaining a model’s outcome is crucial to trust its predictions and take actions from generated explanations. In this section, we explore the most important features of the selected CatBoost model from Section ‘Model performances’ and show how the extracted strengths features are used by our model.

We distinguish between global explanations, which focus on analyzing the overall behavior of the model (importance of features), and local explanations, used to explain predictions of specific instances. Global explainability techniques include methods such as the Partial Dependence Plot (PDP) Friedman (2001) or the feature importance analysis which aim to “describe the average behavior of a machine learning model” Molnar (2020). The literature for local explainability includes a range of model agnostic solutions which spans from surrogate models such as LIME Ribeiro et al. (2016), which aim to locally approximate the black box ML model with an interpretable model and understand from the latter the generated outcome, to game theoretic based approaches based on Shapley values (Lundberg and Lee, 2017).

For our setting, we use a game-theory based approach with the SHAP framework (Lundberg and Lee, 2017) to generate explanations. In particular, given the structure of our model, we use the TreeSHAP implementation Lundberg et al. (2020) which uses the tree structure of the CatBoost model to perform more efficient and exact calculations of Shapley values. We can therefore generate global explanations from aggregated SHAP values to obtain feature importance. SHAP values for a single instance will give us a local explanation of a particular observation in our data set.

The remainder of this section focuses on the explainability of the regression models, aligning with the paper’s main scope. However, the implementations and conclusions are exactly the same for the classification models.

Feature importance from global explanation

Figure 2 highlights the features in our test set which contribute the most to the predicted outcome. We can observe that the strengths features are considered as very important for the model. We notice that the attack strength of the home team is the most important to predict home goals, followed by the defense strength from the away team. This is perfectly logical, and again underlines the impact of our statistical features. To predict the score of the away team, we observe from our experiments that the most important feature is the attack strength of the away team, followed by the defense strength of the home team. This is in line with conclusions from Section ‘Model performances’, namely that, by adding these features to our model, the performance considerably improves.

Figure 2.

Feature importance plot using TreeSHAP for predicting home team’s goals.

Understand match predictions from local explanations

Analyzing predictions by means of local explainability frameworks can help anticipate events during an upcoming match. To that end, we analyze the last game of the season 2022/2023 played at home for Metz Handball. This game was played on May 17 $^{th}$ , 2023 against Chambray Touraine Handball as part of the “Ligue Butagaz Energie” (LBE), the French female 1 $^{st}$ division championship. We choose this game as it is not contained in the training set, the predictions were even made and explanations generated before the game.⁴ Furthermore, the authors attended the game, which can help understand the generated explanations with concrete elements that happened during the actual game. The model predicted a final score of 32-24 in favor of Metz Handball and the actual game saw Metz win 30-26 over Chambray Touraine Handball. We present in Figure 3 the explanations generated from the CatBoost model with strengths features for the selected game.

Figure 3.

Force plot of predicted goals (from CatBoost with strengths) for Metz Handball for the game played on May 17 $^{th}$ against Chambray Handball.

We can read Figure 3 as follows. Features close to the top of the plot contribute positively to increasing the number of goals Metz could score during the game. On the other hand, features at the bottom of the plot contribute negatively to goals scored by Metz, i.e. stand for the defense of Chambray. Therefore, in line with our conclusions from the feature importance plot from Figure 2, the attack strength for the home team (Metz) is the main contributor for the final score. The defense from Chambray, however, contributes to lower the total score, without which the outcome would be much worse for that team. Other features such as the experience from international or wing players positively contributed to the victory of Metz. An additional factor is the number of days until the final (end of season). Although the model could not be aware that Metz, playing their last game of the season at home, was about to receive the trophy and celebrate the title of champion, the few days left until the end of the season contributed to the motivation of players.

Discussion

We showed that the proposed ML solution achieves a high predictive performance and explanations generated with relevant explainability frameworks allow a translation of our analytical findings into concrete sports events. In this section, we argue that this tool can be used by sports professionals such as team coaches to prepare for upcoming games. We also open the discussion for future work on extending team strengths to player abilities as additional statistical covariates.

An analytical tool for coaches

While state-of-the-art Machine Learning models for sports predictions (e.g. football, basketball) usually plateau around 75% accuracy to predict the outcome of a match (Huang and Chang, 2010; Lampis et al., 2023), our proposed solution for handball matches achieves above 80% accuracy. Coupled with explainability capabilities, our approach can translate statistical predictions into real facts happening during a given match. Although no model can guarantee that the result of a game will be as predicted, the model can identify statistical facts that can explain parts of the outcome. Such patterns can therefore be used by team coaches in view of a competition.

Indeed, knowing the prediction of a game together with the potential main contributors to this outcome can help prepare a game and improve the team’s strategy. As we illustrated in Section ‘Understand match predictions from local explanations’, local explainability can reveal where a team is expected to excel or struggle during the match. We observed in Appendix Tables A1 to A3 that a team can have an advantage with the experience of their wing players or goalkeepers and struggle due to the defense strength of the opposing team. Therefore, a coach can use these pieces of information to ensure the team can accentuate on predicted strengths and work on removing their weak points, and adapt their strategy accordingly.

Table A1.

Complete list of features with data type and description.

Attribute	Data type	Description
game_dow	Integer	Day of the week of the game.
game_hour	Integer	Hour of the start time of the game.
importance	Integer	Importance of the competition from the lowest (friendly games with value 3) to the highest importance
		(Champions League with value 1).
days_to_final	Integer	Number of days until the competition’s final or end of season.
away_travel_km	Float	Distance in kilometers (as the crow flies) between the away and home teams’ locations.
home_international	Float	Share of international players (for ongoing season) in home team.
away_international	Float	Share of international players (for ongoing season) in away team.
home_locals	Float	Share of home players with the same nationality as the club (to reflect potential language barriers).
away_locals	Float	Share of away players with the same nationality as the club (to reflect potential language barriers).
diff_wing_height_avg	Float	Difference of the average height of wing players between home and away teams.
diff_back_height_avg	Float	Difference of the average height of back (left, center and right) players between home and away teams.
diff_pivot_height_avg	Float	Difference of the average height of line (pivot) players between home and away teams.
diff_gk_height_avg	Float	Difference of the average height of goalkeepers between home and away teams.
diff_wing_height_std	Float	Difference of the standard deviation of heights of wing players between home and away teams.
diff_back_height_std	Float	Difference of the standard deviation of heights of back (left, center and right) players between home and away teams.
diff_pivot_height_std	Float	Difference of the standard deviation of heights of line (pivot) players between home and away teams.
diff_gk_height_std	Float	Difference of the standard deviation of heights of goalkeepers between home and away teams.
diff_wing_weight_avg	Float	Difference of the average weight of wing players between home and away teams.
diff_back_weight_avg	Float	Difference of the average weight of back (left, center and right) players between home and away teams.
diff_pivot_weight_avg	Float	Difference of the average weight of line (aka. pivot) players between home and away teams.
diff_gk_weight_avg	Float	Difference of the average weight of goalkeepers between home and away teams.

Table A2.

Complete list of features with data type and description (cont’d).

Attribute	Data type	Description
diff_wing_weight_std	Float	Difference of the standard deviation of weights of wing players between home and away teams.
diff_back_weight_std	Float	Difference of the standard deviation of weights of back (left, center and right) players between home and away teams.
diff_pivot_weight_std	Float	Difference of the standard deviation of weights of line (aka. pivot) players between home and away teams.
diff_gk_weight_std	Float	Difference of the standard deviation of weights of goalkeepers between home and away teams.
diff_wing_age_avg	Float	Difference of the average age of wing players between home and away teams.
diff_back_age_avg	Float	Difference of the average age of back (left, center and right) players between home and away teams.
diff_pivot_age_avg	Float	Difference of the average age of line (aka. pivot) players between home and away teams.
diff_gk_age_avg	Float	Difference of the average age of goalkeepers between home and away teams.
diff_wing_age_std	Float	Difference of the standard deviation of ages of wing players between home and away teams.
diff_back_age_std	Float	Difference of the standard deviation of ages of back (left, center and right) players between home and away teams.
diff_pivot_age_std	Float	Difference of the standard deviation of ages of line (aka. pivot) players between home and away teams.
diff_gk_age_std	Float	Difference of the standard deviation of ages of goalkeepers between home and away teams.
attack_strength_home	Float	Attack strength estimated via MLE for home team.
defense_strength_home	Float	Defense strength estimated via MLE for home team.
attack_strength_away	Float	Attack strength estimated via MLE for away team.
defense_strength_away	Float	Defense strength estimated via MLE for away team.

Table A3.

Values of different leagues for national teams and clubs.

Value	National teams	Clubs
10	Olympic Games
9	World championships
8	European championships	EHF Champions League
7	African cup / Eurocup
6	Asian cup / Qualifiers	EHF European League
5	Tournaments / Emerging nations
4	International Friendly Games	EHF European Cup
3		Regular championships
2		National cups
1		Friendly games

From team strengths estimation to player abilities

As presented in Section ‘Estimating team strengths’, the structure of handball games suggests the use of the Conway-Maxwell-Poisson distribution from which we can derive a formula to estimate the attack and defense strengths of a team. We showed the importance of these features to the predictive performance of a model, and Felice (2024) illustrated the relevance of such metrics to derive the ranking of clubs. This methodology can be adapted to other settings such as the estimation of individual player abilities. Although the publicly available data does not allow extracting a long history of player statistics over multiple seasons, having access to such data could lead to similar research. Indeed, following the methodology from Felice (2024) of determining the most appropriate distribution to fit relevant available data, one can also derive a formula to estimate the attack and defense ability of each player. The estimated strength can not only help derive another ranking for players but can also give valuable insights on player abilities and lead to novel analytical tools for managers when recruiting new players.

Conclusions

In this paper, we showed how we can construct a highly accurate predictive model for handball games. While data preparation and feature engineering are often under-explored in the literature Zheng and Casari (2018), our results highlight their importance on the model’s performance. This encourages us to focus, in future works, on the preparation of even more meaningful features to capture more signals and further improve the model’s performance. The models presented in this paper are trained and evaluated on female championships but this work can easily be extended to male championships as well as international competitions. In view of the upcoming Olympic Games in Paris in 2024, the presented solutions can also target national teams’ coaches to prepare for this worldwide event by means of analytical tools powered by accurate Machine Learning models.

Supplemental Material

sj-csv-1-san-10.1177_22150218251313937 - Supplemental material for Predicting handball matches with machine learning and statistically estimated team strengths

Supplemental material, sj-csv-1-san-10.1177_22150218251313937 for Predicting handball matches with machine learning and statistically estimated team strengths by Florian Felice and Christophe Ley in Journal of Sports Analytics

Supplemental Material

sj-csv-2-san-10.1177_22150218251313937 - Supplemental material for Predicting handball matches with machine learning and statistically estimated team strengths

Supplemental material, sj-csv-2-san-10.1177_22150218251313937 for Predicting handball matches with machine learning and statistically estimated team strengths by Florian Felice and Christophe Ley in Journal of Sports Analytics

Supplemental Material

sj-csv-3-san-10.1177_22150218251313937 - Supplemental material for Predicting handball matches with machine learning and statistically estimated team strengths

Supplemental material, sj-csv-3-san-10.1177_22150218251313937 for Predicting handball matches with machine learning and statistically estimated team strengths by Florian Felice and Christophe Ley in Journal of Sports Analytics

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Code availability

The code and analyses developed for this work have been deposited to the GitHub page at .

Notes

A. Appendix

References

Akyüz

Avşar

Bilge

Deliceoğlu

Korkusuz

(2019) Skeletal muscle fatigue does not affect shooting accuracy of handball players. Isokinetics and Exercise Science 27(4): 253–259. DOI: 10.3233/IES-193178.

Alakus

Larocque

Labbe

(2023) Covariance regression with random forests. BMC Bioinformatics 24(1): 258. DOI: 10.1186/s12859-023-05377-y. ISSN 1471-2105.

Breiman

(2001) Random forests. Machine Learning 45(1): 5–32. DOI: 10.1023/A:1010933404324.

Brier

(1950) Verification of forecasts expressed in terms of probability. Monthly Weather Review 78(1): 1–3. DOI: 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.

Cai

Zhou

(2019) A hybrid ensemble learning framework for basketball outcomes prediction. Physica A: Statistical Mechanics and Its Applications 528: 121461. DOI: 10.1016/j.physa.2019.121461.

Camacho-Cardenosa

González-Custodio

Martínez-Guardado

Timón

Olcina

Brazo-Sayavera

(2018) Anthropometric and physical performance of youth handball players: The role of the relative age. Sports 6(2): 47. DOI: 10.3390/sports6020047.

Chen

Guestrin

(2016) XGBoost: A scalable tree boosting system. In: 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, pp.785–794. New York, NY, USA: Association for Computing Machinery. DOI: 10.1145/2939672.2939785.

Felice

(2024) Ranking handball teams from statistical strength estimation. Computational Statistics ISSN 0943-4062, 1613–9658, https://link.springer.com/10.1007/s00180-024-01522-0 DOI: 10.1007/s00180-024-01522-0.

Fonseca

Figueiredo

Gantois

De Lima-Junior

Fortes

(2019) Relative age effect is modulated by playing position but is not related to competitive success in elite under-19 handball athletes. Sports 7(4): 91. DOI: 10.3390/sports7040091.

10.

Friedman

(2001) Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5): 1189–1232. ISSN 0090-5364. https://www.jstor.org/stable/2699986 Number: 5 Publisher: Institute of Mathematical Statistics.

11.

Grabara

(2018) The posture of adolescent male handball players: A two-year study. Journal of Back and Musculoskeletal Rehabilitation 31(1): 183–189. DOI: 10.3233/BMR-170792.

12.

Groll

Heiner

Schauberger

Uhrmeister

(2020) Prediction of the 2019 IHF world men’s handball championship – A sparse Gaussian approximation model. Journal of Sports Analytics 6(3): 187–197. DOI: 10.3233/JSA-200384.

13.

Groll

Ley

Schauberger

Van Eetvelde

(2019) A hybrid random forest to predict soccer matches in international tournaments. Journal of Quantitative Analysis in Sports 15(4): 271–287. DOI: 10.1515/jqas-2018-0060.

14.

Hahn

Glock

Birkefeld

(2013) Fascination for Thousands of Years – Handball History and Stories. International Handball Federation. https://archive.ihf.info/upload/Book/issue0001/offline/download.pdf

15.

Hamon

Junklewitz

Sanchez

MJI

(2020) Robustness and Explainability of Artificial Intelligence.

16.

Huang

K-Y

Chang

W-L

(2010) A neural network method for prediction of 2006 World Cup Football Game. In: 2010 International joint conference on neural networks (IJCNN), pp.1–8. Barcelona, Spain: IEEE. ISBN 978-1-4244-6916-1. DOI: 10.1109/IJCNN.2010.5596458.

17.

Iosipoi

Vakhrushev

(2024) SketchBoost: fast gradient boosted decision tree for multioutput problems. In: Proceedings of the 36th international conference on neural information processing systems, NIPS ’22, pp.25422–25435. Red Hook, NY, USA: Curran Associates Inc. ISBN 978-1-71387-108-8.

18.

Karlis

Michels

Otting

(2024) Modelling handball outcomes using univariate and bivariate approaches, http://arxiv.org/abs/2404.04213 arXiv:2404.04213 [stat].

19.

Lampis

Ioannis

Vasilios

Stavrianna

(2023) Predictions of European basketball match results with machine learning algorithms. Journal of Sports Analytics 9(2):1–20. DOI: 10.3233/JSA-220639.

20.

Ley

Dominicy

(2023) Statistics Meets Sports: What We Can Learn from Sports Data. Cambridge Scholars Publishing.

21.

Lundberg

Erion

Chen

DeGrave

Prutkin

Nair

Katz

Himmelfarb

Bansal

Lee

S-I

(2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2(1): 56–67. DOI: 10.1038/s42256-019-0138-9.

22.

Lundberg

Lee

S-I

(2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, NIPS’17, pp.4768–4777. Red Hook, NY, USA: Curran Associates Inc. ISBN 978-1-5108-6096-4.

23.

Madsen

Ermidis

Rago

Surrow

Vigh-Larsen

Randers

Krustrup

Larsen

(2019) Activity profile, heart rate, technical involvement, and perceived intensity and fun in U13 male and female team handball players: Effect of game format. Sports 7(4): 90. DOI: 10.3390/sports7040090.

24.

McCabe

Trevathan

(2008) Artificial intelligence in sports prediction. In: Fifth international conference on information technology: New generations (ITNG 2008), pp.1194–1197. Las Vegas, NV, USA: IEEE. DOI: 10.1109/ITNG.2008.203.

25.

Miljkovic

Gajic

Kovacevic

Konjovic

(2010) The use of data mining for basketball matches outcomes prediction. In: IEEE 8th international symposium on intelligent systems and informatics, pp.309–312. Subotica, Serbia: IEEE. DOI: 10.1109/SISY.2010.5647440.

26.

Molnar

(2020) Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/decomposition.html.

27.

Olympics (2023) History of handball, https://olympics.com/en/sports/handball/

28.

Pic

(2018) Performance and home advantage in handball. Journal of Human Kinetics 63: 61–71. DOI: 10.2478/hukin-2018-0007.

29.

Prokhorenkova

Gusev

Vorobev

Dorogush

Gulin

(2018) CatBoost: unbiased boosting with categorical features. In: 32nd International conference on neural information processing systems, NIPS’18, pp.6639–6649. Red Hook, NY, USA: Curran Associates Inc. DOI: https://doi.org/10.48550/arXiv.1706.09516.

30.

Ribeiro

Singh

Guestrin

(2016) “Why Should I Trust You?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, pp.1135–1144. New York, NY, USA: Association for Computing Machinery. DOI: 10.1145/2939672.2939778. https://dl.acm.org/doi/10.1145/2939672.2939778

31.

Rodriguez-Ruiz

Quiroga

Miralles

Sarmiento

De Saá

García-Manso

(2011) Study of the technical and tactical variables determining set win or loss in top-level European men’s volleyball. Journal of Quantitative Analysis in Sports 7(1). DOI: 10.2202/1559-0410.1281.

32.

Rosenblatt

(1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6): 386–408. DOI: 10.1037/h0042519.

33.

Saavedra

(2018) Handball research: State of the art. Journal of Human Kinetics 63(1): 5–8. DOI: 10.2478/hukin-2018-0001.

34.

Seil

Rupp

Tempelhof

Kohn

(1998) Sports injuries in team handball. The American Journal of Sports Medicine 26(5): 681–687. DOI: 10.1177/03635465980260051401.

35.

Sellers

(2022) The Conway-Maxwell-Poisson Distribution. Cambridge, United Kingdom; New York, NY, USA: Institute of Mathematical Statistics monographs. Cambridge University Press. ISBN 978-1-108-64643-7.

36.

Sovrano

Sapienza

Palmirani

Vitali

(2022) Metrics, explainability and the European AI act proposal. J - Multidisciplinary Scientific Journal 5(1): 126–138. DOI: 10.3390/j5010010.

37.

Wagner

Finkenzeller

Würth

von Duvillard

(2014) Individual and team performance in team-handball: A review. Journal of Sports Science & Medicine 13(4): 808–816. ISSN 1303–2968.

38.

Wheatcroft

(2021) Evaluating probabilistic forecasts of football matches: The case against the ranked probability score. Journal of Quantitative Analysis in Sports 17(4): 273–287. DOI: 10.1515/jqas-2019-0089. ISSN 1559-0410. Publisher: De Gruyter.

39.

Zebari

Zeebaree

M.Sadeeq

Zebari

(2021) Predicting football outcomes by using poisson model: Applied to spanish primera division. Journal of Applied Science and Technology Trends 2(04): 105–112. DOI: 10.38094/jastt204112. ISSN 2708-0757.

40.

Zheng

Casari

(2018) Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O’Reilly Media Inc.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB