Abstract
We utilize play-by-play data from the National Football League to examine coaching decisions on fourth down and how sensitive they are to information on situational success and their competitive environment. Prior fourth down successes and failures within a game influence coaches in a way consistent with the notion that recent information is more salient to these coaches when making decisions and a belief in in-game momentum. Coaches are more sensitive to fourth down failures than successes, and our findings suggest this sensitivity to prior failures leads to suboptimal fourth down decisions later in the game. This finding is generally driven by the behavior of coaches with a background in coaching offense, suggesting the availability heuristic is particularly potent for managers who are more involved and, perhaps, more accountable for the details of fourth down plays. We suspect these patterns are prevalent in a wide range of managerial contexts.
Introduction
Effective managers consider relevant information for decision-making. This involves adjusting their prior beliefs when new information becomes available and making decisions with those updated beliefs in mind. Managers often have more information available than could reasonably be processed in a timely manner, and high-stakes decisions may need to be made quickly or reflexively. These are also situations that are known to produce errors in judgment. It is thus important to understand which factors successful managers consider when under pressure and faced with time constraints. Of particular interest is whether managers are more sensitive to new positive or negative information and whether that sensitivity depends on their individual experiences as a manager.
In order to understand which types of information impact decision-making, we analyze the decision-making of National Football League (NFL) head coaches, in particular the decision of whether to attempt a fourth down conversion. Unlike other managerial settings where specific objectives, the information available to the manager, and even the decisions themselves are often unobservable, the decisions of NFL coaches and the context in which they occur are observable. While coaches certainly plan for these situations ahead of time, ultimately the decisions are made in an environment that seems to be primed for errors in judgment. These are risky decisions—the outcome of the decision can have a substantial effect on the likelihood of winning a given game—that must be made quickly and frequently follow a negative outcome on the prior play. These features make the decisions likely candidates for hot cognitive errors.
We focus on two types of information: within-game information on prior fourth down successes and failures and information gathered over the season about overall success (yards per play) and situational success (success rate in short-yardage situations) for both the coach's own team and the opposing team. We find that coaches are sensitive to prior successful fourth down attempts and prior failed attempts within the game when making decisions to go for it on fourth down. This pattern is consistent with the overweighting of recent events consistent with the availability heuristic (as described by Tversky & Kahneman, 1973) and a belief in in-game momentum. Coaches have a stronger response to prior negative outcomes than to prior positive outcomes. The response to failures appears to be an overreaction to a negative outcome: coaches are significantly less likely to attempt another conversion after a failure, but conditional on attempting a fourth down conversion, prior failures are positively and significantly related to successful conversions. We also explore how the response to in-game failures and successes interacts with various coaching characteristics, finding variation in the effects based on whether the coach's background involves coaching offense or defense. Namely, coaches with an offensive background significantly overreact to prior failed offensive conversions, while the effect is not significant for other coaches. Such a bias to overreact to information for which a decision maker is most familiar would likely have implications in other managerial settings as well.
The rest of the article is organized as follows. The “Background and Literature” section provides a theoretical background for the analysis and a review of the literature, the “Data” section describes the data, the “Empirical Specification and Results” section presents the empirical specifications and results, the “Discussion” section provides a discussion of the results, and the “Conclusion” section concludes the article.
Background and Literature
This article focuses on the manner in which decision-makers incorporate new information into their decision-making process in high-pressure situations using data from the NFL. The decision that is analyzed is one involving high risks and rewards: what to do on a fourth down play. In NFL football, the team has four plays (referred to as “downs”) to advance 10 yards down the field. If they gain those ten yards, the team earns a new set of four downs. If they do not, the other team gets to ball. On fourth down a team can try to gain the remaining yards to keep possession, punt it away to the opponent (which puts the opposing team in a worse field position), or attempt a field goal.
There are a number of reasons to focus on this particular type of decision. Coaches face these decisions frequently throughout the season, making it possible to examine patterns in decision-making with some degree of precision. The short-term objective of the decision-maker in this situation is relatively clear—a coach is strongly incentivized to maximize the team's chances of winning the game—and the choices and results are publicly observable and have a clear impact on the team's chances of victory. Information on the observed performance of both the offense and defense is available to the coaches prior to making these decisions, and there is variation in this information over the course of a season and within the game. We have clear expectations about how coaches will rationally respond to changes in these factors (a coach should be more likely to attempt a fourth down conversion when their offense is shown to be stronger or the opposing team's defense is shown to be weaker). Furthermore, detailed information about coaching history, coaching background, and coaching success is publicly available for each coach. This allows us to investigate whether these factors relate to how new information is being processed by coaches.
Romer (2006) demonstrates that coaches make suboptimal fourth down decisions if the goal is win-maximization. The analysis was based mainly upon location on the field, distance to for a first down, and the potential changes in score as a result of the play. Of course, the decision also includes other influences that are related to preferences such as risk aversion. Goff and Locke (2019) expand upon Romer (2006) by including forecast errors and risk premia to account for these possible influences and find evidence that risk aversion can explain much of what appears to be suboptimal fourth down decisions.
There are reasons to expect that coaches may not efficiently incorporate new relevant information into their decision-making. It is well known that people generally do not strictly follow Bayesian statistics to update beliefs. Decision-makers often place more weight on the new information and neglect the underlying base rates. Of particular relevance is the availability heuristic proposed by Tversky and Kahneman (1973), where the ease with which information relevant to a decision can be accessed affects the decision. Relevant information related to more recent events has been shown to be overweighted by decision-makers in settings such as financial markets (DeBondt & Thaler, 1985; Lee et al., 2008) and sports betting markets (Durand et al., 2021). The cab problem by Kahneman et al. (1982) is another example of this, and there are numerous examples by Kahneman (2011). Kahneman and Tversky's (1979) prospect theory posits that losses are perceived as more painful than gains in many contexts. As a result of this asymmetry in perceived utility, negative outcomes tend to be more salient and tend to elicit larger reactions than positive outcomes.
Behavioral influences can be compounded by the probabilistic nature of the outcomes associated with decisions in this context. Making a good (bad) decision ex ante does not always produce a good (bad) outcome ex post. It also has the potential to introduce biases (see, e.g., Tversky & Kahneman, 1974). These include the gambler's fallacy whereby decision makers mistakenly believe they are “due” for a good outcome or the “hot hand” fallacy where a series of good (bad) outcomes is mistakenly assumed to be more related to fundamental averages when in fact it is still subject to laws of chance (see Ayton & Fischer, 2004 for a discussion).
In the context of fourth downs, this pattern of overweighting recent successes or failures would manifest itself as coaches responding to within-game momentum. Lehman and Hahn (2013) discuss the impact of “momentum” both within and across periods in the context of NFL fourth down decisions. They define within-game momentum in terms of whether the team scored or conceded the previous points in the game, and they define across-game momentum by the string of wins or losses entering the game. Lehman and Hahn (2013) find evidence that coaches give more weight to recent experiences than to longer trends. However, the results of studies such as Fry and Shukairy (2012) find little evidence for the notion that this type of in-game momentum actually affects the outcome of a play.
There are other features of this particular environment that could lead to behavioral biases in decision-making. While coaches have data on their team's offensive performance that is publicly available, they might form opinions about their own team through scouting, practice, and the day-to-day operations of the team. These additional sources of information may make the coaches less willing to update their beliefs based on their team's realized performance. Kahneman and Lovallo (1993) suggest that in organizations many errors arise from forecasts that originate from within the group of “insiders” and suggest that taking an “outside” view of the situation can reduce this bias.
It is also true that in the NFL, teams must run a play in fewer than 40 seconds from the end of the previous play, and failure to do so results in a penalty. While coaches certainly plan for these situations ahead of time, the time constraints could lead to coach to apply mental shortcuts that may be biased in some way. It has been well documented that people often make different decisions when they must react quickly and reflexively than they would if they were able to slowly process all of the information in a rational way. Loewenstein (1996) suggests that consideration of visceral factors can reconcile seemingly inconsistent behavior in many contexts. Furthermore, in many jobs being able to recognize and adapt to visceral factors is an important pathway toward being successful. Metcalfe and Jacobs (1998) discuss research on the impact of stress as it influences hot and cool memory systems. Often stress leads people to rely on hot memory systems which produce reflexive decisions to follow their gut instincts.
Data
This analysis uses play-level data from the 2008 through 2018 NFL seasons from sports-reference.com. It includes information on the game situation (down, distance to go for a first down, field position, quarter of the game, and time remaining in quarter), the nature of the play (whether the team chose a running play, passing play, punt, or field goal), and the result of the play. Since a key component of this analysis involves whether certain observable coach characteristics affect that coach's responsiveness to game events, we also include data on each coach's background. Specifically, we account for a coach's years of head coaching experience, the coach's cumulative winning percentage in prior seasons, whether a coach has ever won a Super Bowl, and whether the coach's background is on offense. This information also comes from sports-reference.com.
From this data, we construct in-game count variables for success and failure on fourth down, which are two of our key explanatory variables. We use the betting line on the game as one measure of relative team quality and also construct quality measures for the offense and defense of the two teams. We use two measures of offensive quality and two measures of defensive quality that could factor into a coach's decision-making. The strength of the offensive team is represented by their average yards per play in the prior games of the season and the offense's short-yardage success rate (the percentage of the time in the team's prior games that season that they have converted third or fourth down offensive plays with less than 3 yards to go for a first down). The strength of the defensive team is represented by analogous measures: the average yards per play the defense has allowed in prior games and the short yardage success allowed to opposing teams in prior games. 1
The focus of this analysis is the subset of fourth down plays where a coach would conceivably consider running an offensive play. We base our primary definition of this on Romer (2006), which identifies downs and distances where the optimal decision is to run an offensive play on fourth down. Romer's results indicate that teams are quite conservative on fourth down, even in circumstances where his paper finds it optimal to run an offensive play. That same pattern persists through the seasons considered here. Figure 1 clearly shows that in the primary subset of plays used in the analysis, which requires Romer's results indicating a fourth down attempt would be optimal, teams attempt to run plays between 10% and 25% of the time. These relatively low percentages suggest that this is a sufficiently wide set of fourth down decisions to begin the analysis. Furthermore, using this set of fourth downs eases the interpretation of the results. Certainly, there are situations in which running a fourth down play will not improve the team's chances of winning, but for the plays considered here, we have a reasonable basis to assume that it does.

Fourth down attempt percentages by season.
Beginning with the set of fourth down plays where Romer's analysis finds it optimal to run a play, in our primary subset of plays, we eliminate plays where the game situation could strongly dictate the decision, which we define as situations where the win probability of either team is 10% or lower or the final play of either half. 2 These restrictions leave us with a subset of 10,685 fourth down observations. In specifications where we use proxies for the qualities of the offensive and defensive units of teams based on prior performance within a given season, we necessarily eliminate observations from the first games of the season.
Table 1 describes the primary subset of fourth down plays. Within this set of plays, teams attempt a fourth down conversion 17% of the time, converting on 66% of those attempts. Prior fourth down successes and prior fourth down failures have means of 0.12 and 0.07, respectively. The table also shows that, as one might expect, the measures of offensive and defensive quality are roughly symmetrical, with teams gaining 5.4 yards per play and allowing short-yardage conversions on roughly 62% of those offensive plays. Due in part to the fact that the conversion rate variables are constructed from a much smaller set of plays than the yards per play (they can only be constructed from certain situational fourth down plays as opposed to all offensive plays), these variables have higher coefficients of variation.
Summary Statistics on Primary Subset of Fourth Down Plays.
Source: sports-reference.com. Statistics from the primary subset of fourth down plays in the data (only includes down and distance combinations) where Romer’s (2006) results suggest attempting a play; excludes the last play of the first half, the last play of the game, situations where one team's win probability is below 10%.
Table 2 highlights the characteristics of the head coaches in the sample. The statistics in this table are given at the team season-level. 3 On average, coaches have 5.36 years of NFL head coaching experience. Coaches with an offensive background account for 49% of the team-seasons, while coaches with a defensive background account for 51%. Coaches with a cumulative winning percentage above 50% account for 75% of the teams-seasons, and 18% of team-seasons involve a coach with at least one prior Super Bowl victory.
National Football League (NFL) Head Coach Characteristics.
Source: sports-reference.com.
One key relationship in the analysis is between fourth down activity and the team's prior success or failure on fourth down within a given game. Table 3 gives a descriptive summary of some of these relationships. It shows that teams that have not attempted a fourth down conversion are less likely to attempt one—a finding consistent with some teams being generally more conservative on fourth down while others display a more aggressive baseline approach to these situations. However, there are clear differences in the likelihood of a fourth down attempt based on the success of previous attempts. In particular, teams that have had a failed attempt earlier in the game go for a conversion on 18.9% of the relevant fourth downs, while teams that have successfully converted a fourth down attempt a play attempt a conversion on 23.0% of such plays. The same pattern holds for relevant fourth downs in short-yardage situations.
Fourth Down Attempt Percentages in Various In-Game Scenarios.
Source: sports-reference.com. Statistics from the primary subset of fourth down plays in the data (only includes down and distance combinations) where Romer’s (2006) results suggest attempting a play; excludes the last play of the first half, the last play of the game, situations where one team's win probability is below 10%.
Table 4 shows how fourth down attempts vary by the characteristics of the head coach. While the group of all coaches attempts a play on 16.9% of the fourth downs considered in the analysis, the largest deviation from this average seems related to past success: coaches who have won a Super Bowl attempt a play 21% of the time. Over the full subset of fourth down plays analyzed, more experienced coaches, coaches with winning records, and coaches from offensive background attempt fourth downs at a higher rate than the full set of coaches. While the explanatory mechanism is unclear, this positive correlation between success and fourth down attempts lines up with the results of Romer (2006) which suggests a positive relationship between fourth down attempts within this set of plays and team success. It is also consistent with Owens and Roach (2018) who find similar patterns for college coaches where coaches with greater success attempt fourth down plays more frequently.
Fourth Down Attempt Percentages by Head Coach Characteristics.
Source: sports-reference.com. Statistics from the primary subset of fourth down plays in the data (only includes down and distance combinations) where Romer’s (2006) results suggest attempting a play; excludes the last play of the first half, the last play of the game, situations where one team's win probability is below 10%.
Empirical Specification and Results
Effect of Competitive Environment on Fourth Down Attempts
In order to be more precise about measuring a coach's sensitivity to the results of prior fourth down attempts in a given game and the relative quality of the offense and defense, we run probit regression models to estimate how these factors affect the likelihood of a fourth down attempt, conditional on other relevant competitive factors. Since the outcome variable is binary, we use a probit model with robust standard errors to control for potential heteroskedasticity. The baseline specification is below:
Potential concerns about endogeneity would center on whether unobservable factors would influence the decision to attempt a fourth down play. This would include weather issues or any fractional yardage (for instance, a play could be coded as fourth down and 2 to go, but based on the spot of the ball, it is actually 2.4 yards to go for a first down). These factors are essentially random and unlikely to be correlated with the relative strength of the offensive team or the outcomes of fourth down plays from earlier in a given game, and so we proceed with the analysis assuming that this type of endogeneity is not affecting the estimates.
The results from this model of fourth down attempts are presented in Table 5. Column 1 includes only the within game variables and coaching experience, column 2 also includes quality measures for teams that would be on the field for the conversion, and column 3 also includes team quality measures for units that would not be on the field but might factor into the decision. The results indicate that, in all three specifications, coaches are sensitive to within-game fourth down successes and failures, and the defense's yards per play allowed in a manner consistent with expectations.
Marginal Effects From Probit Regressions of Fourth Down Attempts and Successes on Prior Fourth Down Attempt Information and Other Competitive Conditions.
Robust standard errors are in parentheses.
***p < .01, **p < .05, *p < .1.
In the main specification in column 1, the coefficient on the fourth down successes is 0.0148, implying that each successful attempt within a game leads to a 1.48 percentage point increase in fourth down attempts. The coefficient on the fourth down failures is −0.0218, implying that each failed attempt within a game leads to a 2.18 percentage point decrease in fourth down attempts. An increase of one yard per play allowed by the defense is associated with a 0.86 percentage point increase in the likelihood of attempting a fourth down play. The marginal effects on prior successes and failures within the game for the probits which include more team quality measures in columns 2 and 3 are largely consistent. In these specifications the impact of the offensive team's defensive quality appears to have a small, but significant impact. Namely, as a team's own defensive conversion rate allowed increases, the team is more likely to attempt a fourth down conversion.
Effect of Competitive Environment on Fourth Down Success
Next, we test whether past successes and failures within the game and the measures of offensive and defensive quality affect the rate at which teams successfully convert fourth down plays. Conditional on the game situation and yards to go, a greater degree of success on fourth down plays in situations would imply it would be beneficial for coaches to account for such factors in their decision making. The probit specification we use to estimate the benefit of sensitivity to various factors is:
Given that this sample is limited to the plays where coaches select into attempting a conversion, the interpretations of these coefficients are complicated by potential endogeneity. The selection decision of whether to attempt a play could plausibly be related to unobservable factors that would bias the estimated coefficients downward. In particular, teams with a relatively strong offense (or teams facing a relatively weak defense) could be more likely to choose to attempt a conversion when unobservable factors would negatively affect the likelihood of success. One such unobservable factor relates to fractional yardage, as described in Lopez (2020). If, for instance, a play is coded as fourth and 2 yards to go, but the ball is really 2.4 yards from a first down, a relatively strong offensive team may choose to attempt a play, while a relatively weaker team might not. Positive estimated coefficients in spite of this endogeneity would suggest benefits to being sensitive to these factors (while the interpretation of a negative coefficient is more complicated because of the direction of the bias). The coefficients on the performance measures can then be thought of as the “excess returns” associated with being sensitive to that factor. A positive and significant coefficient, for instance, would suggest sensitivity to that factor would increase a team's likelihood of success in a given game situation. The results for successful conversions are presented in columns 4, 5, and 6 of Table 5.
The results for fourth down success show positive and significant coefficients on prior in-game fourth down failures in all three specifications. This may indicate two things. First, as shown in the first three columns of Table 5, coaches are less likely to attempt a fourth down conversion after failing earlier in the game, conditional on observable factors (such as field position and yards required for a successful conversion). That a prior fourth down failure reduces the likelihood of an attempt suggests teams who have failed are self-selecting into a set of plays more likely to yield success. If teams are basing these selection decisions on observable factors, they would conceivably be more likely to attempt a fourth down conversion when the unobservable factors (e.g., fractional yardage) are more conducive to success (and less likely to do so when the unobservable factors are negatively correlated with success). This effect could help explain this pattern. Secondly, it is possible there exists an asymmetry in the usefulness of information gained from prior successes and failures, respectively, as it relates to a strategic advantage on future attempts. The patterns relating to fourth down success are consistent with the idea that failing on a fourth down attempt confers more useful information for future attempts than succeeding does.
Nevertheless, these results suggest that it would benefit the team if coaches were less sensitive to prior fourth down failures. To the extent that it reflects a non-spurious finding, it would suggest a degree of inefficiency in the aggregate decision-making of NFL head coaches. This could reflect a behavioral bias in gut reactions consistent with a belief in a hot hand effect, an overestimation of negative momentum, or some other type of hot cognitive error. We further explore the counterintuitive pattern whereby coaches would be more likely to succeed after failing a prior fourth down attempt comparing whether particular characteristics of coaches are related to this tendency.
Interactions of Coach Characteristics and Competitive Environment
In order to test whether there is heterogeneity in the effects that past successes and failures have on fourth down decisions, we run regression models that add interaction terms related to head coach characteristics to the model given in Equation 1. We test five characteristics separately: whether a coach is relatively experienced (defined as having at least four seasons of prior NFL head coaching experience), whether the coach's cumulative record involves more wins than losses, whether the coach has ever won a Super Bowl, whether the coach has a background coaching on offense, and a coach's cumulative winning percentage. We interact the in-game successes and failures variables with an indicator for the presence of each characteristic and an indicator for the absence of each characteristic and report the results in Table 6. While four out of these five characteristics do not significantly affect a coach's sensitivity to prior fourth down successes or failures, this relationship is significantly more pronounced for coaches with an offensive background. In fact, the estimates in column 4 indicate that the significance of this relationship is confined to coaches with an offensive background. The availability heuristic provides a potential explanation for this: since coaches with offensive backgrounds are more likely to be involved in the actual fourth down play calling that occurs during the game, prior successes and failures might be more salient to them later in the game, and thus, those prior successes or failures might play a larger role in the decision-making process than they would for coaches who are more removed from the details of the fourth down decision-making process.
Marginal Effects From Probit Regressions of Fourth Down Attempts on Prior Fourth Down Attempt Information and Other Competitive Conditions, With Interactions Related to Coach Characteristics.
Robust standard errors are in parentheses.
*** p < .01, ** p < .05, * p < .1.
Discussion
We find that coaches react to prior successes and failures on fourth down conversions within the game when deciding to attempt a conversion. The data indicate that coaches behave as if past outcomes are predictive of future outcomes. Coaches who successfully converted a fourth down earlier in the game are significantly more likely to go for it again whereas coaches who failed to convert earlier in the game are significantly less likely to try again. This suggests that coaches are sensitive to in-game momentum at the time of their decisions. The magnitude of the change in attempt rates is approximately double for past failures than for past successes. This is consistent with the availability heuristic and suggests that decision-makers may be particularly inclined to access recent negative information. Furthermore, the fact that the measured defensive quality is generally more predictive of the decision than measured offensive quality suggests a heterogeneity in terms of how variation in an external factor (in this case, defense) is incorporated into decisions relative to internal factors.
In spite of the decision-making pattern that suggests momentum is a key element of success, the likelihood of a successful fourth down conversion is not entirely consistent with momentum as coaches perceive it. Conditional on attempting another fourth down, the success rate is unrelated to past successes within the game. Perhaps more surprisingly, success is positively and significantly related to prior in-game failures. It seems that coaches are overreacting to negative momentum and not attempting conversions as often as they would to maximize the chance of winning the game.
To the extent that this inefficiency exists, it is being driven by the subset of coaches who have a background as an offensive coach. In our data offensive coaches are more responsive to prior in-game fourth down outcomes than defensive coaches. We speculate that familiarity might lead these coaches to “trust their gut instincts” more than coaches who are possibly more detached from the within-game momentum. Coaches could make better decisions by putting less weight on past failures when they make decisions.
Conclusion
We have investigated how managers process new information and how they respond to it by focusing on NFL coaches and their decisions about fourth down. We focus on this group of managers because the data are readily available and because many features of the decision-making environment are known to bias decision-makers or otherwise cause them to make errors. Coaches spend a great deal of time planning for what they will do in different game situations. These plans are no doubt constructed in a cold, calculating cognitive state. However, this type of careful deliberation can be undermined by the time constraints, in-game pressure, and gut feelings the coaches may have based on what has transpired earlier in the game.
Using play-by-play data from NFL games from the 2008 through the 2018 NFL seasons, we find that coaches are sensitive to prior fourth down successes and failures within the game and to the overall quality of the opposing defense when making fourth down decisions. Conditional on attempting a conversion, fourth down success is positively related to prior in game failures, indicating that coaches are overreacting to negative momentum and not attempting conversions as often as they should following unsuccessful attempts. The relationships between attempts and successes and failures are primarily driven by coaches with an offensive background. Perhaps these coaches are more sensitive to negative information because they are more familiar with these decisions and they “trust their instincts” about decisions. We believe these patterns extend beyond football coaches to others in management roles. Managers should be aware that following their instincts may cause them to overweight negative information that is relevant for their decision, but not as informative as they believe it to be.
Footnotes
Acknowledgments
The authors thank the Black School of Business and Penn State Behrend for research funding.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Penn State Behrend Black School of Business.
