Abstract
Estimating the causal impact of sport or physical activity on health and well-being is an issue of great relevance in the sport and health literature. The increasing availability of individual level data has encouraged this interest. However, this analysis requires dealing with two types of simultaneity problem: (1) between exercise and response variables; and (2) across the different response variables. This note discusses how the previous literature has dealt with these two questions with particular attention paid to the use of seemingly aseptic econometric models proposed by some recent empirical papers. Regardless of the approach, identification necessarily requires the use of untestable hypotheses. We provide some recommendations based on analyzing the robustness of the estimation results to changes in the adopted identification assumptions.
Keywords
Introduction
The purpose of this note is to discuss the use of systems of simultaneous equations in the empirical literature to estimate the impact of health behavior variables, such as sport and/or physical activity, on health and well-being. 1 Recent years have witnessed the availability of surveys that allow for the observation of these variables together with other individual socio-economic characteristics. Although this information has generated a burgeoning research literature aiming to estimate the main determinants of health and wellbeing, 2 an important concern in this type of analysis regards the simultaneous observation of the different variables in the model. This issue becomes especially problematic as many of the databases are cross-sectional data which makes the identification of causal impacts a very difficult task due to the impossibility to identify whether life style variables affect health outcomes or it is the other way around.
Simultaneity is an old problem in many ambits of economics and, in fact, it has been an issue of research even before the creation of the Cowles research institute to the development of econometrics in 1932, see the canonical example in economics about the identification of demand and supply (Wright, 1928). In this context, it is generally accepted that identification can only be achieved by means of untestable assumptions. Some examples are the use of instrumental variables that are selected under the exclusion restriction and matching or propensity score regressions under the strong ignorability assumption, see Wooldridge (2003) and Imbens and Wooldridge (2009) and references therein for relevant examples of these methodologies. Even in the absence of specific instrumental variables, identification can still be achieved by other means, such as assuming a direction of causality among the endogenous variables (recursiveness assumption) or imposing the effect of some specific shocks to be negligible in the long term (long-run restrictions) among others, see Christiano et al. (1999) and Blanchard and Quah (1989) respectively.
In the recent health and sport literature, it has become fashionable the use of simultaneous equation models to deal with simultaneity in cross-sectional databases. These models typically do not use instruments to identify the direction of causality of simultaneous variables but, instead, this is imposed by the recursiveness assumption. We argue this is a sensible approach only when there are solid theoretical arguments to justify this restriction. However, this is not the case if there is double causality between health behavior and health outcome. We also discuss the case where simultaneity affects more than one response variable. In this case, contrary to the claims of previous papers in the literature, the estimation of a reduced form specification is generally not a valid alternative to deal with the simultaneity problem of response variables regardless of whether a seemingly unrelated regression (SUR henceforth) strategy is used or not to account for the fact that errors in the different equations are potentially correlated.
This brief note does not attempt to discuss the main econometric properties of the estimators. Our main purpose is to discuss the theoretical implications of the use of simultaneous equation models to deal with endogeneity adopted by an important strand of the sport economic literature
General Discussion
A common interest in the empirical literature discussed in this note is the causal impact estimation of health behavior, typically sport or physical activity, on a set of response variables which can include health or well-being measures. This implies to deal with the following two types of identification problems: 1) simultaneity between exercise and response variables; and 2) simultaneity across the different response variables. The identification problem can be similarly defined for both cases in the following way.
Let’s define three simultaneous scalar variables
A linear structural system that describes this relationship can be defined as:
where
In this framework, identification of parameters in equations (1) and (2) is possible if variables
Impact of Physical Activity on a Single Response Variable
If the estimation problem concerns the impact that physical exercise exerts on a single response variable, which could be denoted by
Impact of Physical Activity on Multiple Response Variables
The simultaneous observation of different response variables creates an additional endogeneity problem as, in most cases, it is not possible to set a direction of causality across them based on economic theory. We consider in this case that physical exercise is denoted by
As in the previous section, the exclusion restriction is a well-known identification condition of the system above. It establishes that identification of our parameters of interest,
The aforementioned simultaneity problem cannot be solved by a joint estimation of the reduced form version of equations (1) and (2). This issue has been considered in Rasciute and Downward (2010) and references therein to estimate the impact of physical exercise on health and happiness. They claimed to solve the simultaneity problem between health and well-being estimating the following reduced form version of the system described by equations (1) and (2), when variables
However, this estimation does not solve the simultaneity problem between
Literature Discussion and Recommendations
Table 1 shows an overview of the strategies adopted in different papers regarding identification of the causal impact of physical activity on health and well-being. The two approaches considered in this literature to achieve identification are the use of the recursive system of equations discussed in previous section, regarding a single response variable, and the exclusion restriction. The former is based on a recursive system of equations adopted in Humphreys et al. (2014) which does not require instrumental variables but instead, it based on the hypothesis that exercise causes health but not the other way around. The exclusion restriction is the most common strategy and it is based on the untestable hypothesis that there is an instrumental variable which only indirectly affects the response variable (health and/or happiness) through its effect on the decisions to practice exercise. The most common type of instrument is some sport access indicators such as the presence of sport facilities nearby. This strategy was adopted, for example, in Forrest and McHale (2011), Ruseski et al. (2014), Wicker and Frick (2015, 2016), Downward and Dawson (2016), Brechot et al. (2017), and dos Santos et al. (2019). There are other examples of more original instrumental variables that account for sports supply. Thus, Forrest and McHale (2011) consider parental encouragement to play sport during childhood, Sarma et al. (2015) and Downward and Dawson (2016) temperature and month of the year respectively, Wicker and Frick (2015, 2016) club membership and Ruseski et al. (2014) beliefs about the importance of sport participation.
Research Papers Studying the Causal Impact of Physical Exercise on Health and/or Well-being.
The different approaches and instruments used in these papers are not aseptic as these methodologies are based on untestable assumptions. Thus, a sensible strategy would be testing the robustness of the results under different instrumental variables and assumptions. For example, Humphreys et al. (2014) consider both a recursive system of equations and the exclusion restriction to estimate the impact of physical exercise of health outcome finding similar results. Others based their identification strategy on the exclusion restriction but considering more than just one instrumental variable, see Forrest and McHale (2011); Ruseski et al. (2014); Pawlowski et al. (2011); Wicker and Frick (2015, 2016) and Downward and Dawson (2016).
As discussed in previous section, regarding multiple response variables, identification becomes even more problematic when it concerns the simultaneity of health and well-being as it is hard to imagine a variable which only affects one of them, but not the other. Even under the absence of an instrument, it is still possible to achieve identification by using a class of triangular simultaneous equation model as suggested by Klein and Vella (2010). The norm in economics is to study how results depend upon the direction of causality imposed by the recursiveness assumption (Christiano et al., 1999). Thus, in our specific case this would amount to analysing the robustness of the estimation results to changes in the identification assumptions about the causality between health and well-being.
Concluding Remarks
Dealing with simultaneity in cross-sectional databases is a difficult task which requires the use of untestable identification assumptions either in the choice of instruments or the direction of causality. Therefore, when an exclusion restriction is not found, this problem can only be solved by a joint estimation of equations for each of the simultaneously observed variables if we have strong arguments to accept that health behavior affects health outcome but not the other way around. The problem of simultaneity can also regard the estimation of the effect of health behavior on several outcome variables. In this case, contrary to what is claimed by some papers in the literature, if response variables are simultaneously related, in the absence of exclusion restrictions, a SUR reduced form specification produces biased estimation of causal effects. A more sensible approach in this circumstance would be to study the robustness of the results to changes in the direction of causality imposed by the recursiveness assumption.
Footnotes
Acknowledgments
The authors thank two anonymous referees for their helpful comments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
