Sage Journals: Discover world-class research

Abstract

Previous literature leaves the impression that betting market inefficiencies are widespread. However, most studies rely upon limited data and ignore biases’ persistence. Our simulation-based analyses show (1) the impact of low sample sizes on the chance to detect markets that only appear to be efficient and (2) the frequency of observing inefficient periods within fully efficient markets. Afterwards, we (3) empirically analyze real-world football betting markets for 14 consecutive seasons. While inefficiencies occur in singular seasons, they are not persistent or systematic across leagues. Moreover, our simulation-based analyses suggest that statistically significant effects in single seasons are likely to be observed even under full market efficiency.

Keywords

betting markets biases market efficiency Monte Carlo simulation

Introduction

The introduction of online betting enabled bettors to put their money with bookmakers outside their local market, as they can easily compare odds online at low search costs. In 2021, the European Gaming and Betting Association reported more than 40 billion euros in turnover from legal sports betting in Europe alone, underlying the economic relevance of sports betting markets. Because of increased competition in online markets, bookmakers had to improve their forecast precision (Anguita et al., 2017; Che et al., 2017; Forrest et al., 2005; Gomez-Gonzalez & del Corral, 2018; Štrumbelj & Šikonja, 2010), while bookmakers’ margins decreased.

Bookmakers can secure profitability by balancing their books. Such balancing requires stakes on both sides to be leveled in a way that the bookmakers’ profit is independent of the actual game outcome, which again requires very precise odds setting. Bookmakers have to be excellent predictors of game outcomes and contain all information available into the odds to be efficient (Fama, 1970). Accordingly, no systematic strategies should allow bettors to generate profits (Thaler & Ziemba, 1988). Sauer (1998) analyses the efficiency of sports betting markets and their relationship to other financial markets, coming to a mixed conclusion. In addition, the growth of betting markets during recent years opens the question of whether today’s markets are as efficient as expected by theory.

Research on betting market (in)efficiency follows the concept of testing relatively simple strategies towards profits, such as systematically betting on (for example) home teams, underdogs, or teams with a higher sentiment. Here, the favorite-longshot bias (FLB) reflects the tendency of bettors to overvalue underdogs and undervalue favorites, potentially due to risk preference (Snowberg & Wolfers, 2010). Bookmakers could take advantage and deviate the actual betting odds away from the fair odds by offering lower returns on underdogs and higher returns on favorites. If such deviation is large enough, bettors could generate positive returns on investment (ROI) by simply betting on favorites. Several studies provided evidence of the FLB in European football (see e.g., Angelini & De Angelis, 2019; Direr, 2011; Rossi, 2011; Vlastakis et al., 2009). The reverse FLB inversely suggests undervalued underdogs and positive returns when betting on them. Such reverse FLB was found by, for example, Deschamps & Gergaud (2007). Others do not find evidence for this bias, for example, Elaad et al. (2020).

While the game’s location can be advantageous to the team playing at home, it can also decide which team is declared the favorite. The home bias refers to increased (lowered) payouts for the home (away) team compared to the fair odds. If the bias is large enough, a profitable strategy would suggest to bet on the home team systematically. Evidence on biased betting odds towards home teams has been provided by, for example, Angelini & De Angelis (2017), Vlastakis et al. (2009), and Forrest & Simmons (2008). The study by Meier et al. (2021) finds no such effect. When biased odds result from bettors’ sentiment, it is referred to as sentiment bias in the literature. Here, betting odds are biased towards the more popular teams, resulting in positive returns when betting on them. Studies that provide evidence for the sentiment bias include Forrest & Simmons (2008), Braun & Kvasnicka (2013). On the contrary, Deutscher et al. (2018) and Flepp et al. (2016) do not find support for a sentiment bias. More recent evidence on the subject is missing and given in the following analysis.

Table 1 provides an overview of studies in the field, with focus on the pre-COVID-19 period. It identifies the researched biases and individual results. Since we argue that the length of the observation period is crucial in determining biases, it is documented for each study. While covering extended periods, it is essential to note that many studies select sub-samples of games (e.g., only heavy favorites) and report biases for such restricted data sets. Recent literature captures ghost games during COVID-19 and the impact of empty stadiums on performance by players and referees. With the initially vanishing home advantage during COVID-19, the research covers betting markets’ (non)response to such changes. Papers by (Fischer & Haucap, 2022; Meier et al., 2021; Winkelmann et al., 2021) record temporal inefficiencies in betting markets as bookmakers did not adjust the winning chances by home teams accordingly.

Table 1.

Overview of Studies covering Pregame Betting Market Inefficiencies.

Authors and Year	Seasons	ENG	FRA	ITA	GER	ESP	Home Bias	FLB	Sentiment bias	Profitable strategies
Pope & Peel (1989)	1981-1982	✓	✗	✗	✗	✗	✓	✓	✗	–
Cain et al. (2000)	1991-1992	✓	✗	✗	✗	✗	✗	✓	✗	Betting on heavy favorites
Kuypers (2000)	1993-1995	✓	✗	✗	✗	✗	✗	✗	✓	Betting as suggested by model
Cain et al. (2003)	1992-1993	✓	✗	✗	✗	✗	✗	✓	✗	Betting on heavy favorites
Dixon & Pope (2004)	1993-1996	✓	✗	✗	✗	✗	✗	✓	✗	–
Goddard & Asimakopoulos (2004)	1990-2000	✓	✗	✗	✗	✗	✗	✓	✗	Betting early and late season
Deschamps & Gergaud (2007)	2002-2006	✓	✗	✗	✗	✗	✓	✓	✗	Betting on heavy underdogs
Forrest & Simmons (2008)	2001-2005	✗	✗	✗	✗	✓	✓	✗	✓	Betting on popular teams
Graham & Stott (2008)	2001-2006	✓	✗	✗	✗	✗	✓	✓	✗	–
Vlastakis et al. (2009)	2002-2004	?	?	?	?	?	✓	✓	✗	Betting on heavy favorites (especially in away games)
Direr (2011)	2000-2011	✓	✓	✓	✓	✓	✗	✓	✗	Betting on heavy favorites
Franck et al. (2011)	2001-2008	✓	✗	✗	✗	✗	✗	✓	✓	Betting on popular teams
Rossi (2011)	2007-2008	✗	✗	✓	✗	✗	✗	✓	✗	Betting on heavy favorites
Constantinou & Fenton (2013)	2005-2012	✓	✗	✓	✓	✓	✓	✓	✗	Betting on home games when home team is underdog
Flepp et al. (2016)	2011	✓	?	?	?	?	✗	✗	✓	–
Feddersen et al. (2017)	2011-2013	✓	✓	✓	✓	✓	✗	✗	✓	–
Elaad et al. (2020)	2010-2018	✓	✗	✗	✗	✗	✓	✓	✗	–
Angelini & De Angelis (2019)	2006-2017	✓	✓	✓	✓	✓	✗	✓	✗	–
Franke (2020)	2006-2014	✓	✓	✓	✓	✓	✓	✓	✓	Betting on heavy favorites (but only on betting exchanges)
This paper	2005-2019	✓	✓	✓	✓	✓	✓	✓	✓

Note: FLB = favorite-longshot bias; ENG = England; FRA = France; ITA = Italy; GER = Germany; ESP = Spain.

Previous studies on betting market inefficiencies in pre-game betting relied on one or multiple season(s) of data, while many uncovered inefficient odds for different leagues and periods. Since most studies cover only snapshots of relatively short periods, thus relying on fairly small number of observations, the power of hypothesis tests is potentially limited. Therefore, findings uncovering short periods of inefficiencies may simply be driven by statistical noise. Hence, it remains to be investigated whether market inefficiencies are systematic and persistent over longer time periods or whether their appearance is of temporary and random nature. One could further make a case that inefficiencies get reported and published more often than analyses that find markets to be efficient (as expected by theory). Such mechanism, that is, a higher barrier to publication for studies that produce null results, is observed in different fields and labeled as publication bias (Franco et al., 2014). In our case, this would translate into the less interesting case of efficient betting markets being under-reported by the literature.

This article addresses whether inefficient periods in betting markets, as uncovered by previous literature, occur persistently over time and systematically across leagues. Therefore, we first introduce our data, covering 14 seasons from 2005-2006 to 2018-2019 for the five major European football leagues, namely the English Premier League, the French Ligue 1, the German Bundesliga, the Italian Serie A, and the Spanish La Liga, as well as our methodological approach for the analysis of betting market inefficiencies in the “Method and Data” section. The “Simulation-Based Analysis of Betting Markets” section covers a simulation-based analysis of betting markets, deriving the probability of detecting periods of inefficiencies in several theoretical settings. Afterwards, in the “Real-World Betting Markets” section, we empirically analyze the data set detailed in the “Method and Data” section towards the various biases introduced before to indicate if biases persist long-term or appear and vanish randomly. This allows us to contribute to the literature by explicitly discussing the occurrence of betting market inefficiencies in the light of simulation-based analyses, that is, we investigate whether the number of inefficient periods uncovered by the literature exceeds what would be expected by chance only.

Method and Data

To provide a comprehensive long-term analysis, we draw on data from www.football-data.co.uk, which cover all matches of the men’s first football divisions in England (Premier League), France (Ligue 1), Germany (Bundesliga), Italy (Serie A), and Spain (La Liga) from season 2005-2006 to 2018-2019, totaling 25,564 matches. Although information for more recent seasons is available, studies have shown that betting markets had to deal with the impact of COVID-19 on, for example, home advantage (Reade et al., 2022) from the season 2019-2020 onward.

Our data detail the actual result and the pregame betting odds for all potential outcomes (home win, draw, and away win) of each match. As betting odds from various bookmakers are reported in our data, we consider the average betting odds across all bookmakers that provide information. Such (average) betting odds are calculated using, on average, 42 individual bookmaker odds - this value varies only slightly across leagues and seasons. The pairwise correlation in our sample (across all leagues) between betting odds offered by different bookmakers on the same game outcome is fairly high, with at least 0.96 for home wins and 0.95 for away wins.

Descriptive Statistics

For each match, we restrict our analysis to bets on the home and the away team, as odds for draws do not vary much in football (Pope & Peel, 1989). As we analyze matches from the perspective of both teams, each match generates two rows in our data. This accumulates to 51,128 observations across all leagues and seasons considered. Based on bookmakers’ odds, Implied Probabilities $π_{i, j}$ display the expected probability of outcome $j$ and match $i$ and are calculated as

π_{i, j} = \frac{1 / O_{i, j}}{1 / O_{i, h} + 1 / O_{i, d} + 1 / O_{i, a}}, j = h, d, a

(1)

with odds

O_{j}

j = h

for a home win,

j = a

for an away win, and

j = d

for a draw, which is in line with literature on betting markets (see e.g., Deutscher et al., 2018; Feddersen et al., 2017; Forrest & Simmons, 2008). In our analysis of real-world betting markets, we follow the approach mentioned here, implying that implied probabilities are corrected by dividing through the same factor (1+margin) for all outcomes. However, previous literature suggests that margins are higher if the winning probability is low (see e.g., Hegarty & Whelan, 2023; Lindstrøm, 2023). While, in general, there is no knowledge about the exact distribution of the bookmakers’ margin, we run robustness checks in our analyses to account for this issue.

Figure 1 (left panel) shows the boxplots of the Implied Probabilities for home and away wins, which indicate higher implied probabilities for home than for away teams. This is in-line with the home field advantage as we find home teams to win about half of the matches (46.2%), whereas away teams win only one in every four matches (28.0%, see Table 2). The distribution of implied probabilities fluctuates only slightly across leagues (see Table 3). As bookmakers have to adequately incorporate the home advantage into their odds, misjudgements could open the opportunity for bettors to generate positive returns when betting on (or against) home teams. To account for a possible home bias by bookmakers, we introduce the covariate Home, taking value one for bets on the home team.

Figure 1.

The boxplots on the implied probabilities (left panel) and boxplots on the differences in the attendance across leagues (right panel).

Table 2.

Summary Statistics on Home and Away Wins (2005-2006–2018-2019) for All Leagues.

	Premier League	Ligue 1	Bundesliga	Serie A	La Liga	Total
Observations	10,640	10,640	8,568	10,640	10,640	51,128
Home wins (%)	4,962 (46.6)	4,800 (45.1)	3,884 (45.3)	4,906 (46.1)	5,058 (47.5)	23,610 (46.2)
Away wins (%)	3,054 (28.7)	2,820 (26.5)	2,524 (29.5)	2,912 (27.4)	3,024 (28.4)	14,334 (28.0)

Table 3.

Summary Statistics on the Minimum, Maximum, Median, and Mean Value as well as Standard Deviation for the Implied Probability and DiffAttend for Each League.

Variable	League	Minimum	Maximum	Median	Mean	Standard deviation
Implied Probability	Premier League	0.024	0.909	0.354	0.373	0.191
	Ligue 1	0.026	0.909	0.346	0.360	0.156
	Bundesliga	0.021	0.924	0.358	0.372	0.172
	Serie A	0.025	0.909	0.347	0.365	0.181
	La Liga	0.022	0.924	0.354	0.373	0.190
DiffAttend in tsd	Premier League	0.003	64.336	13.541	16.551	13.220
	Ligue 1	0.004	45.806	9.388	12.414	10.335
	Bundesliga	0.013	66.390	17.742	19.390	13.984
	Serie A	0.002	51.917	10.902	14.441	11.646
	La Liga	0.010	73.354	14.063	19.494	17.696

As some studies mentioned above provide evidence for a sentiment bias, we consider the difference in mean attendance between the two opponents in the corresponding season as a proxy for the sentiment. While previous literature often refers to the attendance within the previous seasons, we consider data of the current season. Even if this value becomes known only at the end of the season, we assume that the difference in the mean attendance of the corresponding season is reasonable. Using previous season data would pose the question of how to deal with promoted teams. Promoted teams actually enjoy a strong increase in attendance, while attendance across years in the same division is quite stable. We find a correlation in the mean attendance of nonpromoted teams between two consecutive seasons of 0.979 over all leagues and seasons. While nowadays more adequate proxies of a team’s sentiment exist, such as the number of followers on Twitter, we consider the difference in mean attendance as earlier data on Twitter followers is unavailable - in fact, in our first year of observation in 2005, most social networks did not even exist.¹

Given two observations per match (one for the home and one for the away team), the distribution of DiffAttend is symmetric around zero. Figure 1 (right panel) shows the boxplots for sentiment covariate from the perspective of the team with higher attendance for all leagues. The leagues considered can be broadly categorized into two groups. Whereas for La Liga, the Premier League, and the Bundesliga, the median absolute difference in attendance is around 15,000 and the maximum difference is above 60,000, for Ligue 1 and Serie A the median absolute difference is around 10,000 and the maximum is around 50,000. In addition, the distribution of DiffAttend is right skewed for all leagues (see Table 3).

There is a strong correlation between the Implied Probability and the Home variable of 0.452 as well as between the Implied Probability and DiffAttend of 0.639. Such correlations indicate that home teams and teams with large fan bases are often declared to be the favorite by the bookmaker. Hence, there potentially is an overlap between the home bias, the sentiment bias and the FLB.

Market Development During the Observation Period

As argued in the ‘‘Introduction” section (and as shown by Anguita et al., 2017; Che et al., 2017; Gomez-Gonzalez & del Corral, 2018), bookmakers’ margins decreased over time. Figure 2 (left panel) shows the average margins calculated as $\frac{1}{M} \sum_{m = 1}^{M} (\sum_{i \in {h, d, a}} O_{m, i}^{- 1} - 1)$ for matches $m = 1, \dots, M$ from seasons 2005-2006 to 2018-2019. In all leagues covered, average margins halved from more than 10% at the start of our observation period to about 5% in recent years, while systematic differences between leagues can be observed.

Increased competition between bookmakers should have improved their predicting performance over time. We investigate this assumption by considering the Brier score (Brier, 1950), which is given as

\frac{1}{n} \sum_{i = 1}^{n} (π_{i} - y_{i})^{2},

(2)

where

π_{i}

denotes the implied probability of bet

i

according to the bookmakers’ odds and

y_{i}

indicates whether the bet was won (

y_{i} = 1

) or lost (

y_{i} = 0

). Perfect predictions would lead to a Brier score of 0, while the Brier score increases in the inaccuracy of predicted game outcomes. To evaluate the predictive power over time, Figure 2 (right panel) displays the Brier scores for all leagues contained in our data.

Figure 2.

Bookmakers’ margins and Brier scores during the seasons observed (2005/2006-2018/2019). Symbols indicate different leagues, and the gray dashed lines show the average over all leagues.

Indicated by the gray dashed line, Brier scores across all leagues improved only slightly over time. Comparing both panels in Figure 2, relatively high (low) Brier scores co-occur with high (low) margins, for example, for Ligue 1 in 2010-2011. Jumps in the Brier score are observable in all leagues considered, indicating that the predictive power of bookmakers’ odds varies considerably between seasons. This, in turn, opens opportunities for profitable strategies at times when the predictive power of betting odds is rather low. It becomes even more relevant for recent seasons, as the margins decrease faster over time than the Brier scores (see both panels of Figure 2).²

Modeling Betting Market Inefficiencies

To detect betting market inefficiencies, we use a logit regression model where the response variable $W o n_{i} \in {0, 1}$ indicates whether bet $i$ was won. This enables the analysis of the explanatory power of covariates on the winning probability of a bet beyond the bookmakers’ odds, thus investigating the efficient market hypothesis. Our analysis follows the typical approach of many previous studies on betting market inefficiencies (see e.g., Forrest & Simmons, 2008; Franck et al., 2013).

The Implied Probability allows deriving insights into the FLB. Specifically, it enables a comparison between the implied probability given by the bookmaker and the expected winning probability under our fitted model to reveal a potential FLB. To distinguish between potential biases described in the ‘‘Introduction” section, we further include a dummy variable indicating bets on home teams (Home) to account for a potential home bias. Bettors’ sentiment is proxied by the covariate DiffAttend.

The linear predictor including the covariates introduced above is thus given by

ν_{i} = β_{0} + β_{1} \cdot π_{i} + β_{home} \cdot {H o m e}_{i} + β_{s e n t} \cdot {D i f f A t t e n d}_{i} .

(3)

The logit function links the binary response variable

W o n_{i}

to the linear predictor

ν_{i}

, that is,

logit (P r (W o n_{i} = 1)) = ν_{i}

. The models are fitted by maximum likelihood using the function glm() in R (R Core Team, 2019). As every match appears in our data twice, we cluster standard errors at match level to ensure correct p-values (Zeileis, 2004; Robitzsch & Grund, 2022).

Simulation-Based Analysis of Betting Markets

This section deals with analyses on detecting betting market inefficiencies using Monte Carlo simulation. It investigates how well the logit regression model specified above can detect betting market inefficiencies as we generate artificial data which exhibit biases and fit our model to such simulated data sets. We aim to investigate Type I and Type II errors when analyzing betting market inefficiencies. In particular, we analyze the frequency of mistakenly detecting biases in fully efficient markets and the power of tests as betting market analyses in the past often neglected problems of small samples or reporting of significant biases for rather efficient markets.

First, we consider a data-generating process with a home bias, that is, we modify the underlying probabilities for each outcome (home win, draw, and away win). We randomly generate match outcomes according to these underlying true probabilities for the simulation runs. Next, we fit logit regression models (as introduced above) to each of the artificial samples, which allows us to estimate the probability of reporting a statistically significant home bias depending on the magnitude of the underlying induced bias.

We consider different magnitudes of that bias while also varying the number of seasons (observations). This provides insights into how the sample size affects our results. We additionally discuss the risk of mistakenly reporting inefficiencies due to statistical noise by analyzing the case of fully efficient betting markets, that is, situations where the true probabilities are equal to those implied by the bookmakers. Furthermore, we run a season-by-season analysis of fully efficient markets to estimate the probability of detecting biases due to multiple testing.

Finally, we extend our analysis to the other two most prominent biases investigated in the literature and introduced above, namely the sentiment bias (“Sentiment Bias” section) and the FLB (“Favorite-Longshot Bias” section). Our results from the simulation-based analyses form the basis for discussing the findings on real-world betting markets in the following section in the perspective of what would be expected by chance only.

Simulation Set Up

To ensure a reasonable distribution of winning probabilities in the betting market, our analysis is based on the actual Premier League betting data of all 5,320 matches between seasons 2005-2006 and 2018-2019. We calculate underlying true probabilities $p_{i, j}$ for each outcome $j = h, d, a$ and match $i$ from the corresponding implied probability $π_{i, j}$ and the respective bias considered. A positive (negative) bias corresponds to increased (reduced) underlying true winning probabilities compared to those implied by the bookmakers’ odds, equivalent to chances to gain (lose) money when betting on the outcome. We choose different numbers of most recent seasons $s$ (starting back from 2018-2019) from the set $s = {1, 2, 5, 10, 14}$ of all available seasons, each covering 380 games. This allows us to explicitly study how the sample size and type of bias affects the chances of bothType-1 and Type-2 testing errors. For each type and magnitude of bias, we use Monte Carlo simulation and repeat the simulation experiment $n =$ 10,000 times, thus ensuring reliable results. In each simulation run, according to the underlying true probabilities, we randomly draw the outcome of each match, namely home win, draw, or away win, which, in turn, may deviate from the actual game outcome. Again we include each match twice in the data (with identical outcomes) as we bet on both, home and away wins. We then fit a logit regression model with clustered standard errors to the simulated data and test for the occurrence of a bias, that is, $β_{j} = 0$ versus $β_{j} \neq 0$ . For numerical reasons and to simplify a comparison between the magnitude of effects, we standardize the variables Home and DiffAttend by subtracting the mean and dividing through its standard deviation in the corresponding data set in our logit regression model.

Home Bias

For the first set of experiments, we consider the home bias. As it is assumed that a positive home bias increases the probabilities for home wins and decreases the probabilities for away wins, we set adjusted probabilities ${\tilde{p}}_{i, j}$ for home wins $h$ , away wins $a$ , and draws $d$ for match $i$ as follows:

\begin{matrix} {\tilde{p}}_{i, h} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, h} + β_{h o m e}), \\ {\tilde{p}}_{i, a} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, a} - β_{h o m e}), \\ {\tilde{p}}_{i, d} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, d}) . \end{matrix}

(4)

According to previous literature (Forrest & Simmons, 2008; Winkelmann et al., 2021; and as also verified later in the “Real-World Betting Markets” section), we select $β_{0} = - 2.5$ and $β_{1} = 5$ . Due to the high competition between bookmakers, we assume potential biases to have a limited magnitude. Thus, we choose the parameter defining the magnitude of the home bias according to the set $β_{h o m e} \in {- 0.2, - 0.1, - 0.05, 0, 0.05, 0.1, 0.2}$ . This implies a maximum absolute difference between implied probabilities and adjusted probabilities of 5 percentage points. To ensure underlying true probabilities to take values that sum up to 1 across all outcomes, we then calculate those probabilities as follows:

p_{i, j} = \frac{{\tilde{p}}_{i, j}}{\sum_{k = h, a, d} {\tilde{p}}_{i, k}} .

(5)

We finally fit the following logit regression model to the simulated data:

ν_{i} = β_{0} + β_{1} \cdot π_{i} + β_{h o m e} \cdot {H o m e}_{i} .

Repeating this simulation 10,000 times, Table 4 displays the results on the estimation of the parameter $β_{h o m e}$ depending on the magnitude of the underlying bias and the number of seasons $s$ . While the estimated values are close to the induced bias used in the data-generating process, the intervals between the 2.5% and 97.5% quantile of the estimates obtained become more narrow with an increasing number of seasons included in the simulated data set. As indicated by the very left column, even with a (true) bias of $| β_{h o m e} | = 0.2$ , ${\hat{β}}_{h o m e} = 0$ is included in this interval when considering one season only. Therefore, a researcher has a surprisingly high chance of committing a Type-2 error (false negative) for such betting market data. At the same time, this holds only for $| β_{h o m e} | \leq 0.05$ for larger data sets of at least 10 seasons. These findings are confirmed by the histograms in Figure 3, showing the estimated values ${\hat{β}}_{h o m e}$ depending on the magnitude of the induced home bias $β_{h o m e}$ for $s = 14$ seasons and $n =$ 10,000 simulation runs, corresponding to the very right column in Table 4.

Figure 3.

Histogram of the estimates for $β_{h o m e}$ , that is, the induced home bias, depending on the underlying indicated value of $β_{h o m e}$ for $s = 14$ seasons and $n =$ 10,000 simulation runs. The situation of no bias (i.e., $β_{h o m e} = 0$ ) is highlighted by the vertical line.

Table 4.

Average Estimation Values for the Parameter $β_{h o m e}$ in the Model with Home Bias Depending on the True Value of $β_{h o m e}$ and the Number of Seasons for $n =$ 10,000 Simulation Runs.

$β_{h o m e}$	One season	Two seasons	Five seasons	10 seasons	14 seasons
−0.20	−0.186 [−0.400; 0.026]	−0.183 [−0.336; −0.036]	−0.186 [−0.282; −0.091]	−0.187 [−0.255; −0.122]	−0.186 [−0.245; −0.129]
−0.10	−0.091 [−0.306; 0.120]	−0.089 [−0.241; 0.061]	−0.090 [−0.185; 0.004]	−0.090 [−0.159; −0.023]	−0.090 [−0.146; −0.033]
−0.05	−0.043 [−0.257; 0.169]	−0.041 [−0.189; 0.107]	−0.043 [−0.136; 0.050]	−0.042 [−0.109; 0.026]	−0.041 [−0.099; 0.016]
0.00	0.005 [−0.205; 0.216]	0.008 [−0.141; 0.155]	0.005 [−0.088; 0.099]	0.006 [−0.063; 0.073]	0.007 [−0.050; 0.062]
0.05	0.052 [−0.163; 0.261]	0.053 [−0.097; 0.201]	0.054 [−0.038; 0.147]	0.053 [−0.015; 0.121]	0.055 [−0.000; 0.113]
0.10	0.100 [−0.113; 0.311]	0.101 [−0.045; 0.248]	0.101 [0.005; 0.194]	0.102 [0.035; 0.170]	0.104 [0.047; 0.160]
0.20	0.195 [−0.018; 0.404]	0.197 [0.046; 0.346]	0.198 [0.104; 0.290]	0.198 [0.132; 0.266]	0.200 [0.142; 0.256]

Note: The interval given in brackets indicates the 2.5% and 97.5% quantiles of the 10,000 estimates obtained.

Table 5 displays the proportions out of $n =$ 10,000 simulation runs where we find a significant (positive or negative) home effect depending on the number of seasons considered and the magnitude of the underlying home effect $β_{h o m e}$ in the data-generating process for the 95% confidence level.³ First, our results show that in the case of no bias, that is, $β_{h o m e} = 0$ , the proportion of significant home effects is close to the significance level of 1%, 5%, and 10%, respectively. While this value remains below 0.5 for the 95% confidence level and $| β_{h o m e} | = 0.05$ even for 14 seasons, for $| β_{h o m e} | = 0.20$ and at least five seasons considered, it is quite likely to detect such an effect (probability of about 90% even for the 90% confidence level). Additionally, comparing the results for the same absolute level of the underlying induced bias, it should be noted that it is more likely to detect a positive home bias than a negative one. This is caused by the curvature of the logit link function, which is convex in the interval $[0, 0.5]$ covering most of the implied probabilities. Figure 4 displays the percentage of simulation runs with significant home effect for different values of $β_{H o m e} \geq 0$ and the number of seasons on the 95% confidence level. This allows for a generalization to an even larger number of seasons. We find that the percentage of significant effects converges to 1 if we increase the number of seasons or the magnitude of the induced bias. Thus, it is more likely to detect even a small bias if the data set covers more observations. As a robustness check, we additionally run a linear regression model instead of a logit model. We find that our results are approximately the same compared to those obtained under the logit model in Table 4, while the percentages of simulation runs with significant biases are similar for the same absolute value of positive and negative bias. However, as resulting probabilities need to be truncated at 0 and 1, respectively, we focus on the analysis under the logit model as suggested by previous literature.

Figure 4.

Percentage of simulation runs with significant home effects depending on the number of seasons and the intended underlying bias $β_{h o m e}$ .

Table 5.

Percentage of Simulation Runs with Significant Home Effect on the 95% Confidence Level Depending on the True Value of $β_{h o m e}$ and the Number of Seasons for $n =$ 10,000 Simulation Runs.

$β_{h o m e}$	One season	Two seasons	Five seasons	10 seasons	14 seasons
$- 0.20$	$0.400$	$0.666$	$0.971$	$1.000$	$1.000$
$- 0.10$	$0.127$	$0.214$	$0.464$	$0.748$	$0.874$
$- 0.05$	$0.067$	$0.081$	$0.142$	$0.231$	$0.291$
$0.00$	$0.047$	$0.049$	$0.051$	$0.056$	$0.055$
$0.05$	$0.079$	$0.106$	$0.202$	$0.342$	$0.484$
$0.10$	$0.157$	$0.264$	$0.563$	$0.843$	$0.951$
$0.20$	$0.449$	$0.745$	$0.985$	$1.000$	$1.000$

We now turn our focus to the case that bookmakers perfectly predict the probabilities of match outcomes (i.e., the true probabilities are entirely determined by the bookmakers’ odds), leading to a fully efficient market. This allows us to analyze how often inefficient periods occur by chance, potentially driven by statistical noise, for single seasons as well as for the whole observation period. Table 6 displays the results of a corresponding season-by-season analysis. Specifically, the $t$ -th column shows the proportion of simulation runs where we find at least $t$ seasons with a statistically significant effect, depending on the significance level $α$ . When assuming fully efficient markets, we determine a probability of 11.43% to find a positive home bias to the 10% significance level over the full observation period, while this probability is even 77.63% when considering the chance to find at least one single season with significant home bias within the full observation period.⁴ The analysis for $α = 0.05$ corresponds to the situation of $β_{h o m e} = 0$ and 14 seasons in Table 5. For an extended observation period, it could be expected that the percentages given here are even larger. Our results clearly indicate that significant effects do not necessarily result from biases but can also be observed under full market efficiency due to statistical noise, especially when performing season-by-season analyses. This highlights the importance of drawing on a large enough sample size to identify biases in betting markets. In particular, previous literature had a relatively large probability to find at least a small number of significant effects when performing season-by-season analyses over a long time horizon, while large data sets allow to uncover even biases of a small magnitude.

Table 6.

Percentage of Simulation Runs with Significant Home Effects for Various Number of Seasons and Confidence Levels, Given $β_{h o m e} = 0$ , and $n = 10, 000$ Simulation Runs.

Seasons	1	2	3	4	5	6	7	Full period
$α = 0.10$	77.63%	42.28%	16.06%	4.46%	0.93%	0.24%	0.05%	11.43%
$α = 0.05$	52.18%	15.87%	2.93%	0.40%	0.01%	0	0	5.90%
$α = 0.01$	13.94%	0.94%	0.01%	0	0	0	0	1.18%

We further consider the ROIs obtained in our simulation. While we occasionally observe fairly high returns for single seasons when consistently betting on the home team, Figure 5 shows the boxplots of the average returns when always betting on the home team in all 14 seasons. The average return is close to the negative average margin in the Premier League, obtained as about 6.2% over all 14 seasons analyzed. For single seasons, we find that the variation is much larger with ROI’s for individual simulation runs between $- 0.3$ and 0.3. However, average ROI’s are again close to the average margin in the betting market.

Figure 5.

Boxplot on the average returns on investments (ROI’s) over all matches when consistently betting on the home team in the simulation runs under full market efficiency.

Sentiment Bias

For the sentiment bias, we again consider the situation of both, positive and negative biases, as well as a fully efficient market with unbiased betting odds. As introduced in the “Method and Data” section, the continuous-valued difference in the average attendance between both teams serves as the proxy for the difference in the sentiment between the opponents. Similar to the previous section, we vary the number of seasons as well as the magnitude of the induced bias in the set $β_{s e n t} \in {- 0.2, - 0.1, - 0.05, 0, 0.05, 0.1, 0.2}$ , while we multiply this parameter by the standardized value of ${D i f f A t t e n d}_{i}$ . As for the home bias, we use Equation 6 for the calculation of the adjusted probabilities.⁵ We then refer again to Equation 5 to derive underlying true probabilities $p_{i, j}$ when randomly drawing the outcome of a match.

\begin{matrix} {\tilde{p}}_{i, h} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, h} + β_{s e n t} \cdot {D i f f A t t e n d}_{i}), \\ {\tilde{p}}_{i, a} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, a} - β_{s e n t} \cdot {D i f f A t t e n d}_{i}), \\ {\tilde{p}}_{i, d} & = {logit}^{- 1} (β_{0} + β_{1} \cdot π_{i, d}) . \end{matrix}

(6)

Here, we fit a logit regression model according to Equation 7 and again consider $n =$ 10,000 estimations for $β_{s e n t}$ to analyze the effect of different magnitudes of an underlying sentiment bias on the chance to reveal such a bias from the data set.

ν_{i} = β_{0} + β_{1} \cdot π_{i} + β_{s e n t} \cdot {D i f f A t t e n d}_{i} .

(7)

Table 7 gives the proportion of simulation runs where we detect a statistically significant (positive or negative) sentiment bias for $α = 0.05$ depending on the induced underlying bias $β_{s e n t}$ in the data-generating process. Our results suggest a lower probability of revealing a sentiment bias considering this continuous-valued variable compared to the home effect for the same absolute magnitude of the underlying bias (see Table 5). However, again it is more likely to detect a bias when considering a larger data set, that is, more seasons. While the probability of detecting a bias is again close to 5% for efficient markets with $β_{s e n t} = 0$ , even when including two seasons only, in more than 50% out of the $n =$ 10,000 simulation runs, we find a significant sentiment bias when using $| β_{s e n t} | = 0.2$ in the data generating process. Figure 6 confirms these results by showing histograms of the estimates ${\hat{β}}_{s e n t}$ depending on the magnitude of the sentiment bias when analyzing 14 seasons (corresponding to the very right column in Table 7), while the vertical lines give the situation of $β_{s e n t} = 0$ , that is, the absence of a sentiment bias.

Figure 6.

Histogram of the estimates for $β_{s e n t}$ , that is, the induced sentiment bias, depending on the true value of $β_{s e n t}$ for $s = 14$ seasons and $n =$ 10,000 simulation runs. The magnitude of the underlying home bias is highlighted by the vertical line.

Table 7.

Percentage of Simulation Runs With Significant Sentiment Bias on the 95% Confidence Level for Various Values of $β_{s e n t}$ and the Number of Seasons for $n =$ 10,000 Simulation Runs.

$β_{s e n t}$	One season	Two seasons	Five seasons	10 seasons	14 seasons
$- 0.20$	$0.315$	$0.502$	$0.915$	$0.996$	$0.999$
$- 0.10$	$0.105$	$0.144$	$0.359$	$0.578$	$0.735$
$- 0.05$	$0.060$	$0.069$	$0.115$	$0.163$	$0.235$
$0.00$	$0.052$	$0.052$	$0.048$	$0.055$	$0.055$
$0.05$	$0.073$	$0.096$	$0.160$	$0.248$	$0.314$
$0.10$	$0.136$	$0.204$	$0.432$	$0.661$	$0.797$
$0.20$	$0.343$	$0.546$	$0.917$	$0.996$	$1.000$

Favorite-Longshot Bias

For analyzing the FLB, we follow the approach by Feddersen (2017). Accordingly, we use the following predictor to be applied to the logit link function:

ν_{i} = - ((1 + β_{F L B}) \cdot \log (1 / π_{i} - 1)) .

(8)

Again, we vary the induced bias in the set $β_{F L B} \in {- 0.2, - 0.1, - 0.05, 0, 0.05, 0.1, 0.2}$ , allowing us to consider a potential FLB and a reverse FLB in the same model. We calculate adjusted probabilities for home wins, draws, and away wins according to Equation 9, while Equation 5 gives underlying true probabilities $p_{i, j}$ for simulating $n =$ 10,000 match outcomes. Figure 10 in the Appendix illustrates the impact of $| β_{F L B} | = 0.2$ on the implied winning probability. While negative values of $β_{F L B}$ imply a reverse FLB, positive values correspond to higher true winning probabilities than implied by the bookmaker for favorites.

{\tilde{p}}_{i, j} = {logit}^{- 1} (- (β_{0} + (1 + β_{F L B}) \cdot \log (1 / π_{i, j} - 1)), j = h, d, a .

(9)

To determine whether we obtain a (significant) FLB, we fit a logit regression model based on the predictor given in Equation 8. This allows us to test whether the resulting estimation for $β_{F L B}$ is statistically significant different from 0.

Table 8 gives the proportions out of $n =$ 10,000 simulation runs where the estimated ${\hat{β}}_{F L B}$ is statistically significantly different from 0, depending on the induced values for $β_{F L B}$ in the data-generating process. In addition, Figure 7 shows the estimated values when considering 14 seasons, corresponding to the very right column in Table 8. The vertical lines indicate an absence of the FLB. While for $β_{FLB} = 0$ the percentage of significant effects fluctuates around the corresponding significance level, for $| β_{FLB} | = 0.2$ in about half of the simulation runs, we find a significant (reverse) FLB even when considering one season only. As for the home and sentiment bias, the percentage of simulation runs with significant effect increases in the number of seasons considered as well as in the magnitude of the induced bias. While for $| β_{FLB} | = 0.2$ even limiting simulations to only five seasons, we find a significant effect in nearly every run, for $| β_{FLB} | = 0.05$ the probability for uncovering a significant bias is < 50%, even for 14 seasons.

Figure 7.

Histograms of the estimates ${\hat{β}}_{F L B}$ depending on the true value of $β_{F L B}$ in $n =$ 10,000 simulation runs.

Table 8.

Percentage of Simulation Runs, Where ${\hat{β}}_{F L B}$ Is Significantly Different From 0 (Corresponding to a Slope of 1), Indicating a Favorite-Longshot Bias (FLB) on the 95% Confidence Level Depending on the True Value of $β_{F L B}$ and the Number of Seasons for $n =$ 10,000 Simulation Runs.

$β_{F L B}$	One season	Two seasons	Five seasons	10 seasons	14 seasons
$- 0.20$	$0.593$	$0.865$	$0.994$	$1.000$	$1.000$
$- 0.10$	$0.190$	$0.337$	$0.611$	$0.874$	$0.949$
$- 0.05$	$0.089$	$0.120$	$0.204$	$0.331$	$0.438$
$0.00$	$0.053$	$0.051$	$0.050$	$0.052$	$0.047$
$0.05$	$0.074$	$0.105$	$0.190$	$0.318$	$0.421$
$0.10$	$0.163$	$0.285$	$0.557$	$0.851$	$0.940$
$0.20$	$0.477$	$0.786$	$0.987$	$1.000$	$1.000$

Real-World Betting Markets

After discussing the (in)efficiency of betting markets from a theoretical perspective using simulated data, we now consider the data set introduced in the “Method and Data” section to analyze if biased odds persist over time and if their occurrence is as regular as the literature suggests. We first investigate the different biases discussed above for the Premier League for the full sample from season 2005-2006 until 2018-2019. We then fit our model to season-by-season data to investigate whether biases are of temporary nature only. After discussing results for the Premier League in detail, a brief summary of analog results obtained for the other four European top leagues is provided. We then analyze if the identified biases led to profitable betting strategies.

Biases in the English Premier League

The very left column of Table 9 displays the results of the regression model fitted to all seasons of the Premier League. Our results suggest that game outcome is predicted strongly by the implied probability calculated from betting odds. An increase of 1 percentage point in the Implied Probability - all other covariates held constant—increases the odds of winning a bet by $\exp (5.004 / 100) = 1.051$ . Perhaps somewhat surprisingly, we detect a Home bias. Therefore, betting on home teams increases the chances of winning a bet when controlling for the Implied Probability and DiffAttend.

Figure 8 displays the relationship between the probability implied by the bookmaker on the x-axis and the expected winning probability given by our model on the y-axis for home (left panel) and away games (right panel), including corresponding confidence intervals. The dashed line corresponds to full market efficiency, that is, the implied probability equals the probability under the model since further effects beyond the home effect do not have any explanatory power. These results suggest that bookmakers undervalue favorites with implied probabilities between 0.5 and 0.8 in home games, equal to overvaluing underdogs with implied probabilities between 0.2 and 0.4 in away games. While this supports evidence for a FLB in the data and is in line with the findings on the Premier League by Direr (2011) and Franke (2020), our findings from the simulation-based analysis in the “Favourite-Longshot Bias” section indicate that there is a high chance to uncover even a small FLB in large data sets (see Table 8).

Figure 8.

Probabilities for winning a bet under the model for the full observation period in the Premier League for home matches (left panel) and away matches (right panel). The covariate DiffAttend is set to its mean, that is, zero, in this figure.

Table 9.

Estimation Results for the Regression Model Fitted to Data From the English Premier League.

	Dependent variable:
	Won
	All seasons	2005-2006	2006-2007	2007-2008	2008-2009	2009-2010	2010-2011	2011-2012	2012-2013	2013-2014	2014-2015	2015-2016	2016-2017	2017-2018	2018-2019
Implied probability	5.004 $^{* * *}$	6.008 $^{* * *}$	4.847 $^{* * *}$	7.012 $^{* * *}$	5.177 $^{* * *}$	4.029 $^{* * *}$	2.726 $^{* * *}$	2.784 $^{* * *}$	5.159 $^{* * *}$	5.548 $^{* * *}$	5.217 $^{* * *}$	4.201 $^{* * *}$	6.028 $^{* * *}$	4.569 $^{* * *}$	5.155 $^{* * *}$
	(0.222)	(0.887)	(0.917)	(0.889)	(0.887)	(1.021)	(0.931)	(0.900)	(0.926)	(0.842)	(0.818)	(0.789)	(0.823)	(0.735)	(0.680)
Home	0.136 $^{* *}$	0.122	0.305	$-$ 0.098	0.020	0.792 $^{* * *}$	0.681 $^{* * *}$	0.243	$-$ 0.020	$-$ 0.089	$-$ 0.041	$-$ 0.101	0.254	0.250	0.024
	(0.063)	(0.256)	(0.245)	(0.245)	(0.246)	(0.258)	(0.252)	(0.249)	(0.257)	(0.241)	(0.235)	(0.221)	(0.229)	(0.227)	(0.233)
DiffAttend	0.002	$-$ 0.001	0.003	$-$ 0.002	0.006	0.013	0.011	0.018 $^{* *}$	0.008	$-$ 0.001	$-$ 0.002	$-$ 0.001	$-$ 0.004	0.003	0.005
	(0.002)	(0.008)	(0.007)	(0.007)	(0.007)	(0.010)	(0.007)	(0.008)	(0.008)	(0.007)	(0.006)	(0.006)	(0.006)	(0.006)	(0.006)
Constant	$-$ 2.529 $^{* * *}$	$-$ 2.758 $^{* * *}$	$-$ 2.557 $^{* * *}$	$-$ 3.226 $^{* * *}$	$-$ 2.540 $^{* * *}$	$-$ 2.560 $^{* * *}$	$-$ 2.044 $^{* * *}$	$-$ 1.743 $^{* * *}$	$-$ 2.621 $^{* * *}$	$-$ 2.527 $^{* * *}$	$-$ 2.497 $^{* * *}$	$-$ 2.135 $^{* * *}$	$-$ 2.957 $^{* * *}$	$-$ 2.493 $^{* * *}$	$-$ 2.430 $^{* * *}$
	(0.075)	(0.275)	(0.297)	(0.298)	(0.283)	(0.334)	(0.301)	(0.288)	(0.303)	(0.285)	(0.280)	(0.274)	(0.298)	(0.264)	(0.244)
Observations	10,640	760	760	760	760	760	760	760	760	760	760	760	760	760	760

Note: Values in parentheses are robust standard errors.

$^{*}$ p < .1; $^{* *}$ p < .05; $^{* * *}$ p < .01.

Table 9 displays the results for the Premier League when fitting our model to individual seasons. Each season contains 760 observations (380 matches per season $\cdot$ 2 rows for each match). While our results confirm the large effect of Implied Probabilities, other covariates have significant effects in some seasons but are insignificant in others. In the light of our findings from the simulation-based analysis, we find a probability of about 15% to observe a significant home effect in two out of 14 seasons under full market efficiency (see Table 6). Thus, significant effects in single seasons should not be overinterpreted. On the other hand, the significant home bias in these two seasons contributes to the effect being significant over the full observation period.

As mentioned in the “Method and Data” section, our analysis relies on the calculation of implied probabilities by dividing through the factor (1+margin) for each outcome. Thus, we run robustness checks to consider whether our results remain the same when relaxing this assumption. Previous literature suggests various options to consider differentially weighted margins, for example (1) margins weighted proportional to the odds, (2) using the odds ratio, and (3) using a logarithmic function. In this article, we consider approach (1), while the other two could also be applied. Margins $a_{j}$ for home wins, away wins, and draws $j = h, a, d$ can be calculated as $a_{j} = (a \cdot O_{j} / 3)$ with $a$ corresponding to the total margin for the match and $O_{j}$ the odds for outcome $j$ . Table 12 in the Appendix details the results for the Premier League considering single seasons as well as the whole observation period. While estimation results differ slightly from those obtained under the basic model in Table 9, we find the same significant effects for the same seasons. The same result holds when we consider the explanatory variable $\frac{1}{O_{j}}$ instead of the implied probability.

Further Leagues

The results of the biases analyzed for further European top leagues can be obtained from the Appendix and are only briefly mentioned here. Except for the Bundesliga, the models fitted to data of all seasons indicate a significant FLB for all leagues (see Figures 11–14 in the Appendix). These results extend the findings of Forrest & Simmons (2008) who provide an evidence for the existence of the FLB in La Liga to further leagues. As also revealed by Forrest & Simmons (2008), our results suggest a sentiment bias in the Bundesliga, Serie A, and La Liga (see Tables 14–16 in the Appendix). In addition to the Premier League, we also obtain a home bias in La Liga for the full observation period (see Table 16 in the Appendix). Analyzing single seasons, we find that the effects revealed over the full sample are mostly driven by a small number of seasons and are short-lived. Again, findings for single seasons should be interpreted carefully in the light of our simulation-based analysis.

Returns

The estimated coefficients for the home effect, the sentiment bias, and the implied probability covering a potential FLB indicate that - atleast for a few seasons—the chances of winning a bet are increased when following these strategies. We thus investigate potential profits resulting from these biases in our data set. To become profitable, strategies must exploit biased odds, which more than offset the bookmakers’ margin. The ROIs considered here are calculated as

R O I = \frac{P a y o u t - W a g e r}{W a g e r},

(10)

where the payout of a bet is given by the product of the wager and the odds if the bet was won; if the bet was lost, the payout is simply 0. Throughout this contribution, we consider consistent wagers of 1 euro per bet. Table 10 presents the ROIs for all leagues and seasons, and the last column refers to the ROIs over the entire period. For DiffAttend, bets are placed on teams where the variable DiffAttend exceeds the 95% quantile of the corresponding league and season. Teams with winning probabilities above 50% are denoted as favorites, while we also consider heavy favorites with an implied probability larger than 0.7 (corresponding to the 95% quantile of implied probabilities).⁶

Positive returns are generated in seven out of 14 seasons when strictly betting on home teams in the Premier League. At the same time, the results of our regression model in Table 9 indicate significantly increased probabilities of winning a bet on home teams for two seasons only (2009-2010 and 2010-2011). As the standard error of the parameter $β_{h o m e}$ in our regression model is fairly large when considering single seasons only, in some cases the magnitude of the estimated home bias is not large enough to be statistically significant, while the positive effect still results in a positive ROI. However, over the full observation period, we do not find any leagues with positive returns when consistently betting on the home team (see the very right column of Table 10). While we find a significant effect of the covariate Home in the regression models for the Premier League and La Liga (see Table 9 and Table 16 in the Appendix), the corresponding return over all season is negative. Moreover, the significant effects in the regression model do not guarantee positive returns, as the magnitude of some effects may not be sufficient to counterbalance the bookmakers’ vig.

Table 10.

Returns on Presented Strategies for All Leagues and Seasons.

League	Bet	2005-2006	2006-2007	2007-2008	2008-2009	2009-2010	2010-2011	2011-2012	2012-2013	2013-2014	2014-2015	2015-2016	2016-2017	2017-2018	2018-2019	All
Premier League	Home	0.036	0.008	−0.094	−0.062	0.084	0.002	−0.043	−0.124	−0.007	−0.035	−0.104	0.054	0.010	0.031	−0.017
Ligue 1	Home	−0.119	0.004	−0.087	−0.111	−0.021	−0.139	−0.016	−0.062	−0.114	0.016	−0.110	0.049	−0.062	−0.081	−0.061
Bundesliga	Home	−0.204	−0.091	−0.042	0	−0.166	0.015	−0.042	−0.151	−0.006	0.043	−0.044	0.100	−0.018	−0.048	−0.047
Serie A	Home	−0.106	−0.122	−0.055	0.016	−0.010	−0.020	−0.047	−0.059	−0.043	−0.137	−0.043	−0.004	−0.176	−0.082	−0.063
La Liga	Home	−0.161	−0.067	−0.017	−0.034	0.001	0.013	0.014	0.023	0.025	−0.119	−0.003	−0.038	−0.027	−0.053	−0.032
All	Home	−0.107	−0.052	−0.060	−0.040	−0.017	−0.028	−0.026	−0.071	−0.031	−0.050	−0.061	0.029	−0.056	−0.047	−0.044
Premier League	DiffAttend	0.012	0.117	−0.011	0.249	0.012	−0.161	0.080	0.143	0.017	−0.11	0.072	−0.024	−0.098	0.129	0.031
Ligue 1	DiffAttend	−0.172	−0.038	−0.168	0.127	−0.066	−0.093	−0.180	−0.087	0.003	−0.055	−0.152	0.018	−0.166	0.159	−0.062
Bundesliga	DiffAttend	−0.105	−0.056	−0.038	0.091	0.083	0.217	0.055	0.201	−0.154	−0.163	0.107	−0.135	−0.145	0.069	0.002
Serie A	DiffAttend	0.070	−0.126	−0.149	−0.022	0.173	0.046	0.218	0.183	−0.084	−0.051	0.012	0.208	−0.005	−0.043	0.031
La Liga	DiffAttend	0.003	−0.076	−0.074	−0.067	0.158	−0.111	−0.054	0.024	−0.056	0.046	−0.012	−0.090	−0.03	−0.078	−0.030
All	DiffAttend	−0.035	−0.035	−0.090	0.075	0.072	−0.031	0.023	0.088	−0.051	−0.062	0.001	0.001	−0.086	0.046	−0.006
Premier League	Favorites	0.069	−0.038	0.082	0.086	−0.020	−0.157	−0.047	−0.014	0.048	0.024	−0.060	0.076	−0.031	0.075	0.003
Ligue 1	Favorites	−0.043	−0.107	−0.169	−0.086	−0.050	−0.152	−0.040	−0.022	0.002	0.048	−0.120	0.034	−0.036	−0.099	−0.056
Bundesliga	Favorites	−0.052	−0.143	−0.035	−0.056	−0.171	−0.152	−0.045	−0.026	0.012	−0.141	−0.005	−0.061	−0.014	−0.012	−0.063
Serie A	Favorites	0.018	0.012	−0.044	0.010	0.035	−0.046	−0.036	0.071	−0.004	−0.123	0.015	0.043	0.066	0.013	0.005
La Liga	Favorites	−0.136	−0.117	−0.072	0.076	0.050	0.057	−0.045	−0.043	−0.067	0.008	−0.043	0.047	−0.030	−0.136	−0.029
All	Favorites	−0.022	−0.073	−0.034	−0.007	−0.026	−0.083	−0.043	−0.008	−0.002	−0.032	−0.040	0.035	−0.006	−0.025	−0.025
Premier League	Heavy favorites	0.024	−0.049	−0.017	−0.035	0.003	−0.099	−0.080	0.144	0.005	0.008	−0.023	−0.050	−0.055	0.040	−0.015
Ligue 1	Heavy favorites	−0.060	−0.390	−0.203	0.108	0.038	−0.192	−0.003	−0.121	0.034	−0.011	0.057	0.052	−0.021	−0.001	−0.003
Bundesliga	Heavy favorites	0.008	−0.284	−0.116	−0.109	−0.022	0.174	0.003	−0.048	0.047	−0.052	0.027	−0.106	0.061	−0.089	−0.029
Serie A	Heavy favorites	0.170	0.076	−0.090	−0.136	−0.095	−0.091	−0.091	0.007	0.040	−0.110	0.032	0.066	−0.008	−0.008	−0.002
La Liga	Heavy favorites	−0.041	−0.069	0.226	−0.019	0.061	−0.042	−0.087	0.081	−0.032	0.060	−0.038	−0.013	−0.047	−0.021	0.005
All	Heavy favorites	0.078	−0.053	−0.011	−0.056	0.008	−0.053	−0.069	0.048	0.009	−0.003	0.028	−0.008	−0.025	−0.006	−0.007

Note: Numbers indicate the returns on investment for a given strategy, year, and league. Positive returns to the bettors are indicated bold.

For teams with higher average attendance, we find positive returns in at least half of the seasons for the Premier League, the Bundesliga, and Serie A, leading to positive returns over the entire period of 14 seasons. For a few seasons, the returns are fairly large (above 20% in the Premier League 2008-2009, the Bundesliga in 2010-2011 and 2012-2013, as well as Serie A 2011-2012 and 2016-2017). Total returns over all seasons are also positive, and account for up to 4%. These results are in line with previous findings on a positive sentiment bias in the Premier League (see Franck et al., 2011) and in La Liga (see Forrest & Simmons, 2008). Comparing the returns from Table 10 to our regression results, we find a positive and significant effect of DiffAttend for Serie A in 2006-2007, while the corresponding returns are negative for these two seasons - this again illustrates that bettors also have to beat the bookmakers’ margin to obtain positive returns.

Betting on favorites, that is, teams with a winning probability above 50%, leads to very small but positive returns of 0.5% in Serie A and 0.3% in the Premier League over the full observation period. However, there are vast differences in results across leagues. While betting on favorites in the Bundesliga leads to a positive return in only one season, nine such seasons can be observed in Serie A. For betting on heavy favorites with implied probabilities exceeding 0.7, results are pretty volatile, especially for early seasons. Over the full observation period, we find positive returns of 0.5% for Serie A only. However, returns are larger than the average negative margin for all leagues. Therefore, betting on (heavy) favorites is generally, a more promising strategy.

Discussion

This article analyses betting market inefficiencies in the long-term perspective. Such a perspective can reveal the frequency and persistence of biased odds and inefficiencies over time and compare such information to simulated, unbiased markets. This is particularly interesting in light of intensifying competition between bookmakers, leading to decreasing margins over time. While bookmakers had to improve their forecast precision, it could still be expected that periods of inefficiencies occur more often during recent years.

For the long-term analysis of the five major European football leagues in the “Real-World Betting Markets” section, we find that most inefficiencies leading to profitable opportunities to bettors are short-lived and do not occur persistently over time or systematically across leagues. Related, significant effects in the regression model do not necessarily lead to positive returns as some effects’ magnitude is insufficient to offset the bookmakers’ margins. On the other hand, effects can remain insignificant due to the fairly high standard errors of parameters for single seasons only while still leading to positive returns when simply applying the corresponding betting strategy for the same period. However, we do not find periods of inefficiencies more often during most recent years, implying that bookmakers successfully improved the predictive power of their betting odds. For further analyses regarding the returns of the different betting strategies, one might consider confidence intervals, as the returns reported throughout this article are only point estimates. In particular, one could consider resampling methods, such as bootstrapping, to obtain confidence intervals for the returns.

At the same time, our simulation-based analyses in the “Simulation-Based Analysis of Betting Markets” section show that markets temporarily record inefficiencies even when full market efficiency is assumed. For example, considering a potential home bias, our analyses disclose that there is a high probability of more than 75% of reporting a significant effect in at least one season out of a season-by-season analysis covering 14 seasons in total by chance only due to the Type I error of hypothesis testing. Therefore, it is not surprising that we find at least a small number of significant effects for different leagues and betting strategies in our analyses of the five major European football leagues. In addition, the simulation-based analyses indicate that it is more likely to reveal a positive than a negative effect of the same magnitude due to the structure of betting odds. This is particularly interesting in the light of a potential publication bias (Franco et al., 2014). Furthermore, the probability of disclosing significant effects is higher for the binary variable Home compared to the continuous-valued variable DiffAttend variable. While our results also show that the chance of detecting significant effects increases with the sample size (equivalent to a reduced risk of Type II error), it is less likely to find longer periods of inefficiencies with the chance to generate positive returns for bettors. Given that the probability of detecting a bias depends on the (true) underlying magnitude of the effect, it is vital for future research to explicitly incorporate the findings of prior studies. For instance, if previous research indicates the presence of a sentiment bias characterized by a relatively small effect size, one should anticipate the necessity of a larger data set to detect such a bias compared to a scenario where the bias exhibits a more substantial effect size.

Concluding, our results suggest that the occurrence of significant positive effects disclosed by a (logit) regression model applied to betting market data is not necessarily driven by systematic and persistent biases as partially implied by previous literature, but may be driven by statistical noise and chance. This is underlined by our simulation-based analysis of a fully efficient market regarding the home bias, where a considerable number of simulation runs lead to positive returns for single seasons and a few even over the full observation period. To address the trade-off on the length of the observation period, future research could refer to a rolling window approach by including multiple seasons in a single data set and applying regression models to those windows. In the light of discussions on the consequences of multiple testing (see e.g., Head et al., 2015), this would reduce the risk of misleading results due to chance only, while also allowing to disclose inefficient periods with the chance to generate positive returns of some consecutive seasons as the power of tests is increased. Nevertheless, our analyses suggest that, in the long run, there are (if any) only a few profitable betting strategies, driven mostly by the sentiment bias or by betting on (heavy) favorites, even in more recent seasons with lower bookmakers’ margins. In addition, returns are highly volatile and differ between seasons, suggesting that potential inefficiencies are short-lived and occur unsystematically.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Deutsche Forschungsgemeinschaft (Grant 431536450), which is gratefully acknowledged.

ORCID iDs

David Winkelmann

Marius Ötting

Christian Deutscher

Notes

Author Biographies

David Winkelmann is a PhD student in the department of business administration and economics at Bielefeld University. His research focuses on the application of methods from operations research and statistics in the filed of logistics operations and sports science.

Marius Ötting is a postdoctoral researcher at Bielefeld University and works on topics related to betting markets, such as modelling the behaviour of bettors and fraud detection. In recent years, he has also investigated the existence of the “hot hand” across various sports.

Christian Deutscher, a professor of sports economics at Bielefeld University in Germany, specializes in a wide range of research areas within the field. His work covers subjects on sports integrity, including the examination of issues like match-fixing and discrimination. Additionally, he conducts research on performance evaluation and the demand.

Tomasz Makarewicz is Juniorprofessor at the Bielefeld University. He finished his PhD at the University of Amsterdam and later worked at the University of Bamberg. His research focuses on computational and behavioral methods in macroeconomics and financial markets.

Appendix

References

Angelini

De Angelis

(2017). PARX model for football match predictions. Journal of Forecasting, 36(7), 795–807.

Angelini

De Angelis

(2019). Efficiency of online football betting markets. International Journal of Forecasting, 35(2), 712–721.

Anguita

T. H.

del Corral Cuervo

González

C. G.

(2017). Variabilidad en el mercado de apuestas deportivas. In El uso de datos en la economía del deporte: mirando hacia el futuro, (pp. 22–25). Ediciones de la Universidad de Castilla-La Mancha.

Braun

Kvasnicka

(2013). National sentiment and economic behavior: Evidence from online betting on European football. Journal of Sports Economics, 14(1), 45–64.

Brier

G. W.

(1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3.

Cain

Law

Peel

(2000). The favourite-longshot bias and market efficiency in UK football betting. Scottish Journal of Political Economy, 47(1), 25–36.

Cain

Law

Peel

(2003). The favourite-longshot bias, bookmaker margins and insider trading in a variety of betting markets. Bulletin of Economic Research, 55(3), 263–273.

Che

Feddersen

Humphreys

B. R.

(2017). Price setting and competition in fixed odds betting markets. In The economics of sports betting (pp. 38–51). Edward Elgar Publishing.

Constantinou

Fenton

(2013). Profiting from arbitrage and odds biases of the European football gambling market. Journal of Gambling Business and Economics, 7(2), 41–70.

10.

Deschamps

Gergaud

(2007). Efficiency in betting markets: Evidence from English football. The Journal of Prediction Markets, 1(1), 61–73.

11.

Deutscher

Frick

Ötting

(2018). Betting market inefficiencies are short-lived in German professional football. Applied Economics, 50(30), 3240–3246.

12.

Direr

(2011). Are betting markets efficient? Evidence from European football championships. Applied Economics, 45(3), 343–356.

13.

Dixon

M. J.

Pope

P. F.

(2004). The value of statistical forecasts in the UK association football betting market. International Journal of Forecasting, 20(4), 697–711.

14.

Elaad

Reade

J. J.

Singleton

(2020). Information, prices and efficiency in an online betting market. Finance Research Letters, 35, 101291.

15.

Fama

E. F.

(1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383–417.

16.

Feddersen

(2017). Market efficiency and the favorite-longshot bias: Evidence from handball betting markets. In The economics of sports betting (pp. 105–117). Edward Elgar Publishing.

17.

Feddersen

Humphreys

B. R.

Soebbing

B. P.

(2017). Sentiment bias and asset prices: Evidence from sports betting markets and social media. Economic Inquiry, 55(2), 1119–1129.

18.

Fischer

Haucap

(2022). Home advantage in professional soccer and betting market efficiency: The role of spectator crowds. Kyklos, 75(2), 294–316.

19.

Flepp

Nüesch

Franck

(2016). Does bettor sentiment affect bookmaker pricing? Journal of Sports Economics, 17(1), 3–11.

20.

Forrest

Goddard

Simmons

(2005). Odds-setters as forecasters: The case of English football. International Journal of Forecasting, 21(3), 551–564.

21.

Forrest

Simmons

(2008). Sentiment in the betting market on Spanish football. Applied Economics, 40(1), 119–126.

22.

Franck

Verbeek

Nüesch

(2011). Sentimental preferences and the organizational regime of betting markets. Southern Economic Journal, 78(2), 502–518.

23.

Franck

Verbeek

Nüesch

(2013). Inter-market arbitrage in betting. Economica, 80(318), 300–325.

24.

Franco

Malhotra

Simonovits

(2014). Publication bias in the social sciences: Unlocking the file drawer. Science (New York, N.Y.), 345(6203), 1502–1505.

25.

Franke

(2020). Do market participants misprice lottery-type assets? Evidence from the European soccer betting market. The Quarterly Review of Economics and Finance, 75(1), 1–18.

26.

Goddard

Asimakopoulos

(2004). Forecasting football results and the efficiency of fixed-odds betting. Journal of Forecasting, 23(1), 51–66.

27.

Gomez-Gonzalez

del Corral

(2018). The betting market over time: Overround and surebets in European football. Economics and Business Letters, 7(4), 129–136.

28.

Graham

Stott

(2008). Predicting bookmaker odds and efficiency for UK football. Applied Economics, 40(1), 99–109.

29.

Head

M. L.

Holman

Lanfear

Kahn

A. T.

Jennions

M. D.

(2015). The extent and consequences of p-hacking in science. PLoS Biol, 13(3), e1002106.

30.

Hegarty

Whelan

(2023). Calculating the bookmaker’s margin: Why bets lose more on average than you are warned. University College Dublin Working Paper.

31.

Kuypers

(2000). Information and efficiency: An empirical study of a fixed odds betting market. Applied Economics, 32(11), 1353–1363.

32.

Lindstrøm

J. C.

(2023). implied: Convert Between Bookmaker Odds and Probabilities. R package version 0.5.

33.

Meier

P. F.

Flepp

Franck

E. P.

(2021). Are sports betting markets semistrong efficient? Evidence from the COVID-19 pandemic. International Journal of Sport Finance, 16(3), 111–126.

34.

Pope

P. F.

Peel

D. A.

(1989). Information, prices and efficiency in a fixed-odds betting market. Economica, 56(223), 323–341.

35.

R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

36.

Reade

J. J.

Schreyer

Singleton

(2022). Eliminating supportive crowds reduces referee bias. Economic Inquiry, 60(3), 1416–1436.

37.

Robitzsch

Grund

(2022). miceadds: Some additional multiple imputation functions, especially for ’mice’. R package Version 3.13-12.

38.

Rossi

(2011). Match rigging and the favorite long-shot bias in the Italian football betting market. International Journal of Sport Finance, 6(4), 317–334.

39.

Sauer

R. D.

(1998). The economics of wagering markets. Journal of Economic Literature, 36(4), 2021–2064.

40.

Snowberg

Wolfers

(2010). Explaining the favorite-long shot bias: Is it risk-love or misperceptions? Journal of Political Economy, 118(4), 723–746.

41.

Štrumbelj

Šikonja

M. R.

(2010). Online bookmakers’ odds as forecasts: The case of European soccer leagues. International Journal of Forecasting, 26(3), 482–488.

42.

Thaler

R. H.

Ziemba

W. T.

(1988). Anomalies: Parimutuel betting markets: Racetracks and lotteries. Journal of Economic Perspectives, 2(2), 161–174.

43.

Vlastakis

Dotsis

Markellos

R. N.

(2009). How efficient is the European football betting market? Evidence from arbitrage and trading strategies. Journal of Forecasting, 28(5), 426–444.

44.

Winkelmann

Deutscher

Ötting

(2021). Bookmakers’ mispricing of the disappeared home advantage in the German Bundesliga after the COVID-19 break. Applied Economics, 53(26), 3054–3064.

45.

Zeileis

(2004). Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software, 11(10), 1–17.

Are Betting Markets Inefficient? Evidence From Simulations and Real Data

Abstract

Keywords

Introduction

Method and Data

Descriptive Statistics

Market Development During the Observation Period

Modeling Betting Market Inefficiencies

Simulation-Based Analysis of Betting Markets

Simulation Set Up

Home Bias

Sentiment Bias

Favorite-Longshot Bias

Real-World Betting Markets

Biases in the English Premier League

Further Leagues

Returns

Discussion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Notes

Author Biographies

Appendix

References