Abstract
The hot-hand theory posits that an athlete who has performed well in the recent past performs better in the present. We use multilevel logistic regression to test this theory for National Hockey League playoff goaltenders, controlling for a variety of shot-related and game-related characteristics. Our data consists of
Keywords
Introduction
The hot-hand phenomenon generally refers to an athlete who has performed well in the recent past performing better in the present. Having a “hot goalie” is seen as crucial to success in the National Hockey League (NHL) playoffs. A goaltender who keeps all pucks out of the net for 16 games (4 series of 4 wins) will win his team the Stanley Cup—obviously. In this paper, we use data from the NHL playoffs to investigate whether goaltenders get hot, in the sense that if a goaltender has had a high recent save probability, then that goaltender will have a high save probability for the next shot that he faces.
NHL fans, coaches, and players appear to believe that goaltenders can get hot and that a hot goaltender can be the key to the success of an NHL team. A famous example of a hot goaltender is the legendary playoff journey of Jonathan Quick of the Los Angeles Kings in 2012. With the fifth-best save percentage (
NHL goaltenders let in roughly one in ten shots. More precisely, during the 2018–19 regular season, 93 goaltenders playing for 31 teams faced a total of 79,540 shots, which resulted in 7,169 goals, for an average save percentage of 90.99%. Among goaltenders who played at least 20 games, the season-long save percentage varied from a high of 93.42% (Ben Bishop, Dallas Stars) to a low of 88.58% (Aaron Dell, San Jose Sharks)–a range from 2.41 percentage points (pps) below to 2.43 pps above the overall average save percentage. In the playoffs of the same year, the overall average save percentage was 91.64%. Among goaltenders who played in two or more playoff games, the save percentage varied from 93.56% (Robin Lehner, New York Islanders) to 85.58% (Andrei Vasilevskiy, Tampa Bay Lightning)—a range from 6.06 pps below to 1.92 pps above the overall average save percentage.
It is crucial to determine whether the hot-hand phenomenon is real, for NHL goaltenders, to understand whether coaches are justified in making decisions about which of a team’s two goaltenders should start a particular game based on perceptions or estimates of whether that goaltender is hot. If the hot hand is real, then appropriate statistical models could potentially be used to predict the likely performance of a team’s two goaltenders in an upcoming game, or even in the remainder of a game that is in progress, during which the goaltender currently on the ice has performed poorly.
Our main finding is the non-existence of the hot-hand phenomenon, that is, no positive influence of recent save performance on the save probability for the next shot. We also find a statistically significant negative slope coefficient for the influence of recent save performance on the save probability for one window size; in other words, we demonstrate that contrary to the hot-hand theory, better past performance may result in a worse future performance. For other window sizes, we find insignificant but negative slope coefficients. A permutation test indicates that our main finding is not simply the result of regression-to-the-mean bias.
The remainder of the paper is organized as follows: in Section “Literature review”, we review related literature; in Section “Data and variables”, we describe our data set; in Section “Models”, we specify our regression models; in Sections “Results” and “Permutation test for estimation bias”, we present our results; and in Section “Discussion and conclusion”, we discuss our findings.
Literature review
We summarize five streams of related work addressing the following: (1) whether the hot hand is a real phenomenon or a fallacy, (2) whether statistical methods have sufficient power to detect a hot hand, (3) whether offensive and defensive adjustments reduce the impact of a hot hand, (4) estimation of a hot-hand effect for different positions in a variety of sports, and (5) specification of statistical models to estimate the hot hand.
Is the hot hand a real phenomenon or a fallacy? The hot hand is originally studied in the 1980s in the context of basketball shooting percentages (Gilovich et al., 1985; Tversky and Gilovich, 1989a, 1989b). These studies conclude that even though players, coaches, and fans all believe strongly in a hot-hand phenomenon, no convincing statistical evidence supports its existence. Instead, Gilovich et al. (1985) attribute beliefs in a hot hand to a psychological tendency to see patterns in random data; an explanation that has also been proposed for behavioural anomalies in various non-sports contexts, such as financial markets and gambling (Miller and Sanjurjo, 2018). Contrary to these findings, recent papers by Miller and Sanjurjo (2018, 2019) demonstrate that the statistical methods used in the original studies are biased, and when their data is re-analyzed after correcting for the bias, strong evidence for a hot hand emerges. Do statistical methods have sufficient power to detect a hot hand? Gilovich et al. (1985) analyze players individually. This approach may lack sufficient statistical power to detect a hot hand, even if it exists (Wardrop, 1995, 1999). Multivariate approaches that pool data for multiple players have been proposed to increase power (Arkes, 2010). We follow this approach, by pooling playoff data for multiple NHL goaltenders over multiple years. Do offensive and defensive adjustments reduce the impact of a hot hand? A hot hand, even if it is real, may not result in measurable improvement in performance if the hot player adapts by making riskier moves or if the opposing team transfers resources to focus on the player. For example, a hot basketball player may attempt riskier shots and the opposing team may double team such a player. The extent to which such adjustments can be made varies by sport, by position, and by the game situation: There is little opportunity for such adjustments for basketball free throws (Gilovich et al., 1985) and there is less opportunity to transfer defensive resources towards a single player in baseball than in basketball (Green and Zwiebel, 2018) because the fielding team faces batters sequentially. Resources can only be transferred across time by letting better pitchers face better batters, rather than across space for the defensive team since there is only one batter in the batter’s box at a time. The opportunity for transferring resources is even less for an NHL team facing a hot goalie than for a baseball team facing a hot batter, because the NHL team typically faces the same goalie for the entire game, which minimizes the opportunity to transfer resources across time. Transferring resources across space is only possible through strategies like “crowding the net”. Overall, the opportunities to transfer resources away from other tasks and towards scoring are more limited than in the case of basketball and even baseball. NHL goaltenders thus provide an ideal setting in which to measure whether the hot-hand phenomenon occurs. Estimation of a hot-hand effect for different positions in a variety of sports. In addition to basketball shooters, the list of sports positions for which researchers have investigated hot-hand effects includes baseball batters and pitchers (Green and Zwiebel, 2018), soccer penalty shooters (Ötting and Andreas, 2022), soccer in-game scoring (Parsons and Rohde, 2015), dart players (Ötting et al., 2020), and golfers (Livingston, 2012). In ice hockey, a momentum effect has been investigated at the team level (Kniffin and Mihalek, 2014). Vesper (2015) has investigated hot-hand effects for ice hockey shooters but not for goaltenders, except for the study by Morrison and Schmittlein (1998). The latter study focuses on the duration of NHL playoff series, noted a higher-than-expected number of short series, and proposed a goaltender hot-hand effect as a possible explanation. This study has not analyzed shot-level data for goaltenders, as we do. Specification of statistical models to estimate the hot hand. Hot-hand researchers have used two main approaches in specifying their statistical models: (1) Analyze success rates, conditional on outcomes of previous attempts (Albright, 1993; Green and Zwiebel, 2018) or (2) incorporate a latent variable or “state” that quantifies “hotness” (Green and Zwiebel, 2018; Ötting and Andreas, 2022). We follow the former approach. With that approach, past performance is typically summarized over a “window” defined in terms of a fixed number of past attempts—the “window size.” It is not clear how to choose the window size. We vary the window size over a range that covers the window sizes used in past work. We also perform the analysis using time-based windows, an approach that complicates data preparation and has not been used by other investigators. Results for the two windows types are consistent (Appendix “Time-based windows”).
We contribute to the hot-hand literature by investigating NHL goaltenders. This position has not been studied previously and provides a setting in which there are limited opportunities for either team to adapt their strategies in reaction to a perception that a goaltender is hot. In terms of methodology, we use multilevel logistic regression, which allows us to pool data across goaltender-seasons to increase statistical power, and we use a wide range of shot-based windows to quantify a goaltender’s recent save performance.
Data and variables
Our data set consists of information about all shots on goal in the NHL playoffs from 2008 to 2016. The season-level data is from www.hockey-reference.com (Hockey Reference, 2017) and the shot-level data is from corsica.hockey (Perry, 2017). We have data for 48,431 shots, faced by 93 goaltenders, over 795 games and 9 playoff seasons, with an average of 30.46 shots on goal per team per game and 91.64% of the shots resulting in a save. We divide the data into 224 groups, containing from 2 to 849 shot observations, based on combinations of goaltender and playoff season. The data set includes 1,662 shot observations for which one or more variables have missing values. Removing those observations changes the average save proportion from 91.64% to 91.61% and the number of groups from 224 to 223. We exclude observations with missing values from our regression analysis but we include these observations when computing the variable of interest (recent save performance), as discussed in Section “Variable of interest: Recent save performance”.
Dependent variable: Shot outcome
The dependent variable,
Variable of interest: Recent save performance
The primary independent variable of interest,
For each group j, we set the shot index i equal to 0 for the last shot in the regular season and 1 for the first shot in the playoffs. Thus,
As stated previously, we included shots with missing values for the control variables in the computation of
Control variables: Other influential factors
We include a vector,

Angle =α and Distance =d for shot origin.
Models
We use multilevel logistic regression with partial pooling, also referred to as mixed effects modelling. We rescale the variable of interest
We exclude all shots from the regression model for which
Our variable of interest becomes:
We allow the intercept and the slope coefficient of the variable of interest to vary by group, but the control variable slope coefficients are the same for all groups, as shown in the following partial pooling specification:
All estimates that we report in Section “Results” are obtained using Markov chain Monte Carlo (MCMC), using the rstan and rstanarm R packages. We use the default prior distributions from the rstanarm package. The default distributions are weakly informative—Normal distributions with mean 0 and scale
Results
First, we provide detailed results for the
Baseline results
Table 1 provides means and 90% credible intervals (Bayesian confidence intervals) for the intercept and slope fixed effects (
Results for the
90% credible interval excludes zero.
Our main finding from the baseline model is that a goaltender’s recent save performance does not have a positive impact on the save probability, which shows the non-existence of the hot-hand effect; moreover, we report a negative and statistically significant estimate of the fixed effect (posterior mean) that is contrary to the hot-hand theory.
Most of the control variables have a significant impact on the save probability. The posterior mean values for the significant control variables are in the direction we expect, except for Period 3 and
Consistency of the main finding
Our main finding, that recent save performance has a negative estimate of the fixed effect value, holds for all window sizes (Figure 2). However, the estimate of the fixed effect is statistically significant at the

Recent save performance fixed effect coefficients (
The fact that the slope fixed effects for the recent save performance,

Distributions for recent save performance mixed effects (
The slope coefficients for all control variables are consistent in sign and magnitude and are within the 120-shot model

Control variable slope coefficients (
Effect size of independent variables on save probability
We investigate the impact of recent save performance and the control variables on the estimated save probability for the next shot, using the baseline model. To calculate the impact of varying a particular variable on the save probability, we set all other independent variables to their average values (sample proportions for categorical variables). We consider the average value for
The impact of increasing the recent save performance from
Given that we define
Figure 5 shows the impact of the control variables.

Control Variables versus estimated save probability.
Permutation test for estimation bias
In our rescaling process, we subtract a long-term average over a window that only includes shots that occur before shot i. This approach, however, may introduce a regression-to-the-mean effect which causes negative fixed effect estimates: if a goaltender’s performance, after controlling for observable factors, is completely random, then we will expect a period of above-average or below-average recent save performance to be likely to be followed by a period of save performance that is closer to the average. To determine whether the negative
For this analysis, we use the 120-shot baseline model and perform 100 permutations. In each permutation, we randomly permute the
Figure 6 shows the results. The mean and median

Results of permutation test.
Discussion and conclusion
We employ multilevel logistic regression to investigate whether the performance of NHL goaltenders during the playoffs is consistent with a hot-hand effect, using data from the 2008–2016 NHL playoffs. We measure past performance using shot-based windows. Our window sizes span a wide range: from, roughly, 1 game to 5 games. We allow the intercept and the slope for recent save performance to vary across goaltender-season combinations.
We find no evidence of hot-hand effects and a significant negative impact of recent save performance on the next-shot save probability, for one window size. The estimates were negative for all window sizes and goaltender-season groups. Our findings are contrary to the findings for baseball in Green and Zwiebel (2018), who use a window size similar to our longest window, and hypothesize that hot hand effects should generally be observed for skilled activity.
Referring back to Section “Permutation test for estimation bias”, if the effect we find were entirely due to regression to the mean, then we would expect that as we increase the window size used to measure recent save performance, the average amount by which performance moves toward the average to decrease. The fact that we observe an increase in magnitude followed by a decrease (see Figure 2), together with our permutation test results, argues against our finding being driven by regression to the mean.
A motivation effect provides one possible explanation for our findings. That is, if a goaltender’s recent save performance has been below his average for the current playoffs, then his motivation increases, resulting in increased effort and focus, causing the next-shot save probability to be higher. Conversely, if the recent save performance has been above average, then the goaltender’s motivation, effort, and focus decrease, leading to a lower next-shot save probability. Bélanger et al. (2013) find support for the first of these effects (greater performance after failure) for “obsessively passionate individuals” but have not found support for the second effect (worse performance after success) for such individuals. The study has found support for neither effect for “harmoniously passionate individuals.” These findings are consistent with Hall-of-Fame goaltender Ken Dryden’s (2019) sentiment that “if a shot beats you, make sure you stop the next one, even if it is harder to stop than the one before.” The psychological mechanisms underlying our findings warrant further study.
Although the estimated recent save performance coefficients are consistently negative, their magnitude varies. In particular, the magnitude increases sharply with the window size up to
A literal interpretation of our negative finding suggests a strategy that prefers a goalie with below-average recent save probability because his save probability for the next few shots is expected to improve. One should be cautious about using such a strategy, keeping in mind Goodhart’s Law, that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes” (Chrystal, 2003). For one thing, such a strategy would provide an incentive to underperform in non-essential games to earn more playing time in big games.
One limitation of our study is that, in defining windows, we ignore the time that passes between games. Past research, such as Green and Zwiebel (2018), shares this limitation. This limitation can be particularly serious for backup goaltenders, for whom the interval between two successive appearances may be several days long.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
