Abstract

Analyses of observational, health-related study data can generally be delineated as belonging to one of three classes of tasks: (i) description of the distribution of risk factors or disease occurrence, (ii) prediction of specific current or future outcomes or identification of risk groups, or (iii) estimation of the causal effect of a specific exposure or intervention on a defined outcome (1). Authors often report their estimated measures as “associations,” which, when presented without a clear specification of the task the research team aimed to address, lack meaningful interpretation (2).
In this issue of Cephalalgia, Acarsoy and colleagues make their aim explicit: to estimate the causal effect of having migraine on future stroke occurrence in the Rotterdam Study, a prospective, population-based cohort study among middle-aged and elderly participants (mean age 65.7 years, 58% women) (3). After confounding adjustment, the authors found that having migraine was associated with a higher risk for incident stroke (hazard ratio of 1.44, 95% confidence interval: 0.96-2.15); a finding that was not statistically significant.
One strength of this study is the use of a directed acyclic graph (DAG), a graphical tool that facilitates a transparent presentation of the authors’ causal assumptions, encoding the complex (mathematical) relationships about the underlying data generation process (4). As not all readers of Cephalalgia may be familiar with DAGs, we briefly introduce this tool commonly used in epidemiology to inform design and analytical strategies to mitigate biases.
A DAG uses directed arrows pointing from cause to effect to illustrate the assumed causal relationships between individual variables (nodes). Decisions surrounding which nodes to include in a DAG and how to connect them are made based on prior knowledge and reasonable assumptions. For instance, causes must always precede effects in time.
DAGs provide a means to quickly identify potential biases, such as confounding, which is a distortion in the effect estimate of interest introduced by common causes of both the exposure and outcome. Visually, confounding is evident in a DAG when there are open paths connecting exposure and outcome that are not in the forward, causal direction (so-called “open backdoor paths”) (4). In such cases, to obtain a valid causal effect estimate, all non-causal paths must be “blocked” (i.e., via adjustment, for example, using a multivariable regression model, as in the study by Acarsoy and colleagues), leaving open only the causal path(s) of interest.
Using DAGs can also prevent so-called “overadjustment bias” introduced by conditioning on “intermediates” on the causal path from exposure to outcome. Quickly detectable in a DAG, if an arrow points from the exposure into an intermediate variable, which, in turn, causes the outcome, adjustment for this intermediate will remove part of the total average causal effect of interest, a practice that should generally be avoided (5). DAGs can also shed light on selection biases, such as a spurious statistical association that can be induced by conditioning on a node at which two arrows on a path collide (i.e., a common consequence). This bias, the so-called “collider stratification bias” can be introduced either in the study design or analysis phase (6). P-value-based, stepwise (i.e., forward selection or backward elimination), or change-in-estimate approaches to select variables are agnostic to these biases and are considered inferior variablel selection techniques when aiming to answer causal questions (7).
In applied research, constructing a DAG can prove challenging as some potentially relevant nodes and the direction of some arrows may not be known. Indeed, in-depth subject matter knowledge is necessary to build a DAG, and with time, a given DAG will change as the evidence base evolves. One may not agree with all elements of the DAG constructed by Acarsoy and colleagues (3). For example, migraine may be considered a cause of hypertension or a consequence of it; indeed, it also depends on the operationalization and timing of the variables’ assessment. However, presenting underlying assumptions transparently in DAG helps facilitate more meaningful scientific discourse about the analytical setup (e.g., disagreement over a specific arrow) and, as such, can identify specific gaps in the existing knowledge base where further research is needed.
The use of DAGs remains rare in applied biomedical research, although they are becoming increasingly common among observational studies with causal aims (8). The way in which DAGs are presented in the literature varies widely (8). For instance, Acarsoy and colleagues summarize multiple covariates into aggregated “super-nodes” (3). Although this improves readability and reduces complexity, collapsing these nodes leads to over-simplification, not explicitly stating how these aggregated variables affect each other. Future work could consider presenting individual variables with finer granularity as individual nodes to depict the underlying causal structure in greater detail, especially in terms of temporality. This could have potential implications for the adjustment strategy.
Readers should be aware of another bias that can complicate studying causal relationships; the “prevalent user” (or prevalent condition) bias, which can arise when participants in the exposed group already had the exposure or condition of interest for some time. Such individuals are likely different from those with a new onset condition (or new users of an intervention) (9). In migraine research, however, this is challenging, as cohort studies with detailed information about incident migraine (or historical information on migraine onset) across the lifecourse remain uncommon, a situation that urgently needs to improve. Acarsoy and colleagues investigate prevalent (existing) migraine as their primary exposure, likely since few would be expected to develop new migraine in this middle-to-older-aged cohort (3). Participants with prevalent migraine have had the disease for different lengths of time, which may differentially influence stroke risk. If only prevalent migraine information is available, researchers could also consider a predictive task; for example, does adding information about prevalent migraine improve the ability of a risk score to predict future stroke?
In summary, we encourage researchers to make their causal aims and framework explicit when addressing a causal inference task using observational data, as Acarsoy and colleagues did. For such tasks, DAGs are useful tools to inform study design and analysis strategies, help detect potential biases, and facilitate transparency in reporting that is needed to further the scientific discourse.
Footnotes
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: TK and JR report no conflicts of interest directly related to this work.
Outside of the submitted work, TK reports to have received research grants from the Gemeinsamer Bundesausschuss (G-BA – Federal Joint Committee, Germany), the Bundesministerium für Gesundheit (BMG – Federal Ministry of Health, Germany). He further has received personal compensation from Eli Lilly and Company, Teva Pharmaceuticals, TotalEnergies S.E., The BMJ, and Frontiers. JLR reports having received a grant from Novartis Pharma for conducting a self-initiated research project on migraine across the lifecourse.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
