Sage Journals: Discover world-class research

Abstract

There is considerable interest in studying the impact of major life events (e.g., marriage, job loss) on people’s lives. This line of research is inherently causal: Its goal is to study whether life events cause changes in the examined outcomes. However, because major life events cannot be randomly assigned, studies in this area necessarily rely on longitudinal observational data. In this article, we provide guidelines for researchers interested in studying life events in an explicitly causal framework. Although focused on life-event studies for substantive context, many recommendations also apply to longitudinal observational studies more broadly. We begin by emphasizing the importance of clearly specifying the causal estimand and describe conditions in which the defined causal estimand can be identified. Then, we discuss the features and challenges of the two main analytical approaches to causal inference in life-event studies: difference-in-difference designs with a (matched) comparison group that attempt to separate event-related changes from normative changes and within-person designs that control for all time-invariant person-level confounders. We describe how the desired causal effect can be estimated in these designs and provide recommendations for when to apply each modeling strategy. In addition, we present methods for conducting sensitivity analysis, probing the robustness of the estimated causal effects, and evaluating the generalizability of the results. We conclude by describing how new specialized panel studies can be designed to examine the impact of various life events in more controlled settings.

Keywords

causality observational studies life events generalizability best practices

People get married, become (grand)parents, and start their dream job, but they also experience breakups, the death of loved ones, and involuntary job losses. Such major life events can be defined as “time-discrete transitions that mark the beginning or the end of a specific status” (Luhmann, Hofmann, et al., 2012, p. 594). Life events can be primarily exogeneous, for example, when a child is killed in an automobile accident (Lehman et al., 1987). Or life events can have a large endogenous component, such as when individuals get divorced (van Scheppingen & Leopold, 2020). Studying how life events affect people has been an active area of research in diverse areas of psychology in recent years. For example, researchers have examined how life events affect subjective well-being (for a meta-analysis, see Luhmann, Hofmann, et al., 2012), personality traits (for a meta-analysis, see Bühler et al., 2024), mental and physical health (e.g., Asselmann, Garthus-Niegel, Knappe, & Martini, 2022; Lawes et al., 2022), loneliness (Buecker et al., 2021), optimism (Chopik et al., 2020), spirituality (Trutzenberg & Eid, 2024), and perceived social support (Asselmann, Garthus-Niegel, & Martini, 2022). The goal of life-events research is inherently causal: It studies whether life events cause changes in the examined outcomes. Yet the traditional gold standard method of estimating causal effects, the randomized experiment, is not feasible in the context of life-event studies because ethical and practical issues preclude random assignment of individuals to specific life events (West et al., 2008). Consequently, life-event research has to rely on observational data in which individuals are exposed or not exposed to specific life events.

Most life-event studies are prospective in nature, meaning that the outcome is assessed at least once before the life event occurred. In contrast, in retrospective life-event studies, the first assessment occurs after the life event occurred. In this article, we cover only prospective life-event studies (for a discussion of issues in retrospective studies, see Holland & Rubin, 1988). Many prospective life-event studies are based on existing panel studies, such as the German Socio-Economic Panel (Wagner et al., 2007; for an example, see Krämer & Rodgers, 2020), the Swiss Household Panel (Tillmann et al., 2022; for an example, see Anusic et al., 2014), the Household, Income, and Labour Dynamics in Australia panel (Watson & Wooden, 2004; for an example, see Hentschel et al., 2017), the British Household Panel Survey (University of Essex, Institute for Social and Economic Research, 2018; for an example, see Yap et al., 2012), or the Dutch LISS panel (Scherpenzeel & Das, 2010; for an example, see Reitz et al., 2022). All of these studies repeatedly interview many individuals using a common set of items over multiple years. These studies typically assess a wide range of psychological constructs (e.g., personality, well-being) and ask their respondents whether they experienced any major life events since the last interview. Based on these data, researchers can track how various constructs change before and after major life events. Event-related changes in the focal outcome that occur after the life event are generally called socialization effects (e.g., Reitz et al., 2022). However, changes can also occur in advance of a life event. Such prospective effects are a result of anticipation and selection (e.g., Luhmann et al., 2013). Anticipatory effects arise when individuals change because they already anticipate or prepare for an expected future major life event. For example, people might be happier in the months before the birth of their first child because they look forward to being parents. Selection effects occur when preevent characteristics are related to the likelihood of experiencing a certain life event. For example, happy individuals might have a higher likelihood of finding a suitable partner with whom to have children. Researchers generally attempt to statistically control for selection effects to derive unbiased estimates of anticipatory and socialization effects.

Historically in psychology, there have been strong admonitions against the use of causal analysis in observational studies, with advice to substitute causal language with scientific euphemisms such as the life event is “associated with” or “predicts” the outcome. Over the past few decades, major advances have occurred in the understanding of causal inference in computer science (e.g., Pearl, 2009), econometrics (e.g., Imbens, 2024), epidemiology (e.g., Hernán & Robins, 2024), and statistics (e.g., Rosenbaum, 2020; Rubin, 2007). Comparisons of randomized experiments and well-designed observational studies sharing the same treatment and control groups have shown that they frequently lead to comparable estimates of causal effects (e.g., Cook et al., 2020; Keller et al., 2024). Given these developments, calls for the use of causal thinking in the design and causal language in the write-up of observational studies have occurred in public health (Hernán, 2018), medicine (Dahabreh & Bibbins-Domingo, 2024), and psychology (Grosz et al., 2020). In this article, we review these new developments in design and analysis with the goal of presenting a guide to strengthen causal inference in longitudinal research on the effects of life events. As a running example, we often use the research question, “How does the birth of the first child affect life satisfaction?” (Dyrdal & Lucas, 2013; Krämer & Rodgers, 2020; Yap et al., 2012) to clearly convey the ideas presented. Although we focus on life-event studies to provide a specific substantive context, most of our recommendations will also apply to longitudinal observational studies more broadly.

We commence this article by stressing the importance of clearly defining the causal estimand. Then, we describe conditions in which the defined causal estimand can be identified. We proceed by discussing the features and challenges of the two main design approaches to estimate causal effects in longitudinal observational studies: difference-in-difference designs with a (matched) comparison group that attempt to separate event-related changes from normative changes and within-person designs that control for all time-invariant person-level confounders. We then describe methods for conducting sensitivity analyses that probe the robustness of the estimated causal effects and methods for evaluating the generalizability of the results. We conclude by describing how new panel studies could be designed to deepen understanding of how various life events affect individuals.

Step 1: Defining the Causal Estimand

The first step in a life-event study should be to clearly define the causal estimand—the specific quantity or parameter that represents the causal effect of interest in the study. A clear definition of the causal estimand enables researchers to pose well-defined research questions without reference to any specific statistical model (Lundberg et al., 2021; Rohrer & Murayama, 2023). Only after specifying the estimand should researchers think about developing a suitable method to estimate their defined target quantity. The definition of the causal estimand in life-event studies requires specification of the following four entities: the causal contrast of interest, the focal outcome, the time lag of the causal effect, and the target population (Diener et al., 2022; Hernán & Robins, 2024; Lundberg et al., 2021). We consider each of these four entities in turn below. Moreover, in Box 1, we present guiding questions for the definition of the causal estimand that researchers should try to consider and transparently answer in their life-event studies.

Box 1.

Guiding Questions to Define the Causal Estimand in Life-Event Studies

Causal contrast - What is the randomized experiment that you would have conducted if no practical or ethical concerns were present?- How is the life event of interest defined? • What eligibility criteria must individuals meet to be included in the event group? • On which dimensions might the experience of the focal life event differ between individuals (e.g., first vs. subsequent occurrences, normative vs. nonnormative experiences)?- What is the comparison condition to which the experience of the life event is contrasted? • If using a between-person contrast, what eligibility criteria must individuals meet to be included in the comparison group? • If using a within-person contrast, how is the baseline level defined to ensure accurate comparison? Outcome - Does the outcome measure adequately assess the focal type of change (i.e., state change, trait change, or change in trajectory)?- If the outcome relies on self-reports, is it feasible and beneficial to complement these with other data sources (e.g., informant ratings, mobile sensing)?- Does transforming the outcome scores into a different metric (e.g., percentage of maximum possible scores) improve the clarity and comparability of the results? Time lag - What is the theorized temporal structure for how the effects of the focal life event unfold? • How well does the time lag in the available data align with this theorized time frame? Can a timeline be used to visualize how the time frame of the study matches the timing of the theorized processes?- Are the results consistently contextualized within the time frame examined in the study to avoid misinterpretation? Target population - What is the target population to which the results are expected to generalize? • If the study aims to estimate conditional average causal effects for specific subgroups, are the defining characteristics of these subgroups properly measured with covariates?

Defining the Causal Contrast

To clearly define the causal contrast, it can be helpful to think about the randomized experiment the researcher would have conducted if no practical or ethical concerns were present. This hypothetical experiment is sometimes referred to as the target trial (Hernán & Robins, 2016, 2024). For life-event studies, setting the causal contrast therefore involves clearly defining the life event of interest (i.e., the exposure) and the comparison condition with which the experience of the life event should be contrasted.

Defining the life event of interest

On the surface, defining the life event of interest seems straightforward: The occurrence of the focal life event needs to be observed and coded (e.g., by using event checklists or observing a status change over time).¹ What is often overlooked in this context, however, is that the same life events can refer to distinctly different exposures. For example, a repeatedly occurring life event can have different effects depending on whether it was the first, second, or third occurrence (Luhmann & Eid, 2009): For parents, the birth of their first child is likely a different event than the birth of their third child. Furthermore, the effects of life events can differ depending on whether their occurrence is normative or nonnormative (Luhmann et al., 2014): Becoming a parent in one’s 20s or 30s is normative in Western societies and will likely represent a different event than becoming a parent at age 15. Individuals may further differ in how they perceive different life events (Haehner et al., 2023; Luhmann et al., 2021): A pregnancy can be expected or unexpected, desired or undesired.

To identify the causal effects of interest, it is crucial that there is only a single well-defined version of the exposure (see section “Identification of the Causal Effect”). Careful consideration is therefore needed to determine potential dimensions on which the experience of the focal life event might differ between individuals. To minimize unwanted variation in the exposure (i.e., the experience of the focal life event), researchers can specify eligibility criteria for their study (Hernán & Robins, 2016; Moreno-Betancur, 2021). Paralleling experimental studies, these eligibility criteria need to be defined in terms of characteristics that are measured before the exposure has already had an effect. Life-event researchers should always make it transparent how they define the exposure and the eligibility criteria. For example, researchers could define the exposure as “voluntarily becoming first-time biological parents between the ages of 20 and 35” and consider only individuals who fulfill these criteria.

Defining the comparison condition

After defining the life event of interest (i.e., the exposure), the comparison condition to which this exposure should be contrasted needs to be specified. In life-event studies, individuals who experienced the life event (i.e., event group) are commonly compared with a comparison group of individuals who could have potentially been exposed but were not exposed to the life event. We term the contrast between the event group and the comparison group a between-person contrast because different individuals are compared with each other. There are generally multiple ways to define comparison groups in life-event studies. For example, when studying the effects of the birth of the first child, first-time biological parents (i.e., event group) could be compared with either a comparison group of individuals who (a) remain voluntarily childless, (b) stayed childless even though they wanted to become parents, or (c) did not become biological parents but adopted their first child. Each of these causal contrasts addresses a substantially different research question and will likely yield a different causal effect estimate. The first contrast focuses on the effects of intentional family-life choices, the second focuses on fulfilling the desire to become biological parents, and the third focuses on the effects of different pathways to parenthood (biological vs. adoption). This example underscores that researchers have to carefully consider and transparently describe the causal contrast in which they are interested and define appropriate eligibility criteria for the comparison group.

An alternative approach is to define a within-person contrast by examining how an individual’s outcomes change after experiencing the focal life event compared with the individual’s own outcome levels before the event occurred. A key assumption of this approach is that the level of the outcome at pretest assessment (e.g., before the life event occurred) resembles the level of the outcome that would have been observed at posttest assessment if that person had not experienced the life event. This assumption may be violated in life-event studies, for example, when the life event of interest is predictable and known to have anticipatory effects, such as the birth of the first child. To provide a proper baseline comparison, researchers interested in the causal effect of the life events should therefore ideally include only preevent observations that were made before the potential onset of anticipatory effects. Researchers specifically interested in distinguishing between the anticipatory effects and the immediate effects of the event that occur over and above anticipatory effects should include observations made during the preevent period in which anticipatory effects are expected to occur.

Defining the Outcome

After specifying the causal contrast of interest, researchers have to decide on a measure of the outcome. The outcome measure has to be carefully chosen and should be sensitive to the theoretically expected pattern of change over time. In general, three concepts of change can be distinguished. State change refers to temporary short-term change from the life event (a temporary increase in positive mood following the birth of the first child). Trait change characterizes an enduring change in the mean level of the outcome (a permanent change in positive mood following the birth of the first child). Change in trajectory occurs when a developmental process is changed as a function of the life event. For example, following the birth of the first child, the parent’s conscientiousness may increase more rapidly toward a higher asymptotic level over time than in the comparison group (for an empirical example, see Moser et al., 2012). Life-event researchers are typically interested in more enduring changes in level and trajectory than in temporary short-term state changes. Appropriate measures need to be able to represent these types of enduring change. Unfortunately, because most existing life-event studies have relied on data from existing panel studies, the variables researchers could use as an outcome have often been limited in practice. Still, researchers should always consider and critically discuss the extent to which the focal outcome is of theoretical and practical interest. Furthermore, researchers should reflect on how the outcome is operationalized in a given study and whether it provides an appropriate measure of the theoretically hypothesized effects.

Researchers should also carefully think about the method of measurement. Self-reports are typically the most important method of assessing the participant’s personal viewpoint. In Box 2, we present four key factors that should be considered when working with self-reports in life-event studies. However, self-reports also have well-known limitations: Depending on the construct of interest, expert ratings or ratings of knowledgeable informants might be a more appropriate or an important supplemental measure of the construct (Eid et al., 2025; Funder & West, 1993; Letzring & Spain, 2021). Moreover, assessing changes in physiological systems—activity, health, social functioning, mobility, and so on—might also offer valid insights into the effects of life events. Modern measurement devices, such as mobile sensing, allow less intrusive and more frequent recording of many important physiological and social variables that can be used in life-event research (Mehl et al., 2024). Multimethod assessment strategies not only allow researchers to measure diverse facets of the effects of life events but also can enhance the validity of change measurements (e.g., Eid & Diener, 2006).

Box 2.

Key Factors to Consider When Using Self-Report Measures in Life-Event Studies

Time frames: Self-report measures can vary in the time frames to which they refer (e.g., Luhmann, Hawkley, et al., 2012; Scharbert et al., 2024). Items with longer time frames (e.g., “How do you feel in general?”) tend to be less sensitive to event-induced trait changes than items with shorter time frames (e.g., “How did you feel in the last 3 months?”). However, items with very short time frames (e.g., “How do you feel right now?”) are more likely to be influenced by situational factors (e.g., rainy vs. sunny weather) unrelated to the focal life event, which could introduce bias in observed changes. Researchers should carefully consider which time frame is appropriate for their research question. In addition, items from established scales may vary in their sensitivity to change. For instance, in the Satisfaction With Life Scale (Diener et al., 1985), items such as “So far, I have gotten the important things I want in life” and “If I could live my life over, I would change almost nothing” are likely less sensitive to life-event-related changes than the item “I am satisfied with my life.” Eid and Hoffmann (1998) presented methods for the estimation of trait change based on the idea of aggregating state measures that were assessed repeatedly before and after an event.Social comparisons: Some self-report questionnaires rely on social comparisons. For instance, the Subjective Happiness Scale (Lyubomirsky & Lepper, 1999) includes the item “Compared to most of my peers, I consider myself . . . ,” and response options range from 1 (less happy) to 7 (more happy). A clear specification of the social-comparison group can improve reliability and validity (Mabe & West, 1982). However, the meaning of the reference group (e.g., “peers”) can also vary across individuals and contexts (Chen & West, 2008). In addition, the reference group individuals compare themselves with may change over time, particularly through the experience of life events. Therefore, items containing social comparisons should generally be avoided in life-event research because it is often unclear which reference group respondents are using for comparison.Multifaceted constructs: When examining constructs with multiple facets, it is important to ensure that all relevant facets are included (e.g., relationship satisfaction, income satisfaction, sleep satisfaction, and leisure satisfaction in the context of the birth of a child). For hierarchically structured outcome measures, it might further be valuable to conduct the analysis at the narrower facet level instead of the broader (second-order) dimensional level to obtain a more differentiated picture of the effects. For instance, when studying the effects on extraversion, one could present analyses in which the different extraversion facets (e.g., warmth, excitement-seeking) are modeled separately (see Seifert et al., 2024). Crucially, analyzing the effects at the facet level increases the number of effect estimates, requiring researchers to control for alpha inflation because of multiple testing, for example by using the Benjamini and Hochberg (1995) correction.Socratic effect: Participants often need to familiarize themselves with the measurement instruments. In longitudinal studies, self-reports typically become more consistent and reliable over time, a phenomenon known as the Socratic effect (Jagodzinski et al., 1987). Particularly strong differences in scale usage have been observed between the first and second measurement occasions. Therefore, it may be advisable to exclude the first measurement from substantive data analysis.

Finally, because questionnaire measures do not have a natural scaling, it can be useful to rescale questionnaire measures to allow comparison across constructs and studies. For example, when response scales with a varying number of response categories are used (e.g., 5-point and 7-point scales), responses can be transformed into percentage of maximum possible (POMP) scores (Cohen et al., 1999). POMP scores range from 0 to 100, making them interpretable as percentage scores, which can simplify the interpretation of the magnitude of causal effects. Furthermore, POMP scores, unlike standardized scores (e.g., Cohen’s d), do not rely on assumptions about the distribution of scores in a population, particularly with respect to the observed variance in a given sample. When questionnaires assess similar content with different items but include some overlapping common items, integrative data analysis (Hussong et al., 2013) can be used to harmonize the responses into a single scale.

Defining the Time Lag

Another integral part of defining the causal estimand is to specify the time lag of interest (Gollob & Reichardt, 1987; Voelkle et al., 2018). In the case of first-time parenthood, researchers might be interested in (a) the immediate effects occurring in the days and weeks after the birth or (b) the longer-term effects several years after becoming a parent. Researchers should ideally match their choice of time lags to those proposed by theories describing the timing of how a certain life event affects a certain outcome (Hopwood et al., 2022; Luhmann et al., 2014). In practice, achieving this desideratum will often be challenging because psychological theories often fail to precisely describe the timing of psychological processes. Furthermore, life-event studies often rely on preexisting panel data so that researchers are bound to the temporal resolution provided by the measurement schedule in these studies. Specifically, most panel studies use yearly measurements, which are well-suited to examine long-term effects of life events but are too coarse to study finer-grained changes occurring in close proximity to life events. One way to increase the temporal resolution of life-event studies is to collect information on the exact timing of the occurrence of the event (Hudde & Jacob, 2022). An even better way is to collect new prospective-panel data with shorter time intervals between measurement occasions (e.g., Lawes et al., 2023). We recommend that researchers always clearly state the time lag when interpreting the effects and elaborate on how well the study’s time lags align with the theorized temporal processes (Hopwood et al., 2022). One effective approach is to create a timeline that maps both the theoretically relevant periods associated with a life event and the study’s time frame. This timeline can also guide the selection of covariates, ensuring that the event and comparison groups are appropriately defined and balanced (see section “Step 2: Identifying the Causal Effect”).

Defining the Target Population

The last component of the causal estimand is the target population for which researchers wish to make inferences. Causal effects can be defined at the individual level, at the subgroup level (e.g., males), or for a population of individuals. Rubin (1974; see also Holland, 1986) originally defined the individual causal effect of a person i as the difference between the potential outcome following exposure (Y_i¹) and the potential outcome following exposure to the comparison condition (Y_i⁰). Rubin noted that individual causal effects are generally not estimable because only one of the potential outcomes can be observed at a time, an issue that Holland (1986) termed the “fundamental problem of causal inference” (p. 947). This is also true for the context of life-event research: One can either measure an individual’s score when the individual has experienced the focal life event (Y_i¹) or when the individual has not experienced it (Y_i⁰); both potential outcomes can never be observed at the same time in the same person. Thus, researchers often aim at estimating average causal effects for a specific population of individuals. Such average causal effects can be identified in observational studies, but only under strong assumptions (see next section, “Identifying the Causal Effect”).

When researchers aim at estimating average causal effects, they should clearly conceptualize and transparently describe the target population to which they wish to generalize. Clearly defining the target population enables researchers to make well-informed decisions concerning which data they should use and which analyses they should run (Greifer & Stuart, 2023; Lundberg et al., 2021). Often, researchers are interested in the average effect of the life event in a whole population; this effect is called the average treatment effect (ATE). Sometimes, researchers are also interested in conditional average causal effects, such as the average effect for (a) individuals who actually experienced the life event versus highly similar control individuals who could potentially experience the life event but did not (i.e., the average treatment effect of the treated [ATT]) or (b) a specific subgroup of individuals (e.g., effects for low-income single mothers). These conditional average effects can be estimated if the defining features of the subgroup are measured with covariates (e.g., Mayer et al., 2016).

Step 2: Identifying the Causal Effect

Conditions That Need to Be Met

Once the causal estimand is conceptually defined, researchers need to identify the causal estimand using observable variables (Lundberg et al., 2021). Average causal effects can be identified if the following three conditions are met (Hernán & Robins, 2024): consistency, positivity, and exchangeability. Consistency refers to the fact that (a) the exposure is well defined and (b) only a single version of the exposure is realized. Whether the consistency condition can actually be fulfilled cannot be empirically tested; it can be made plausible only by reasoning. As discussed above, this can be achieved by minimizing unwanted variation in the exposure (i.e., the life event), for example, by specifying eligibility criteria. Furthermore, consistency implies that there is no interference, meaning that the potential outcome of any individual should not depend on the exposure versus nonexposure of other individuals. In the context of becoming first-time parents, this would mean that the effect of becoming a first-time parent on the parents’ life satisfaction has to be independent of whether other couples in the study (e.g., friends or neighbors) also become first-time parents. Rubin (1980) introduced the term stable-unit-treatment-value assumption (SUTVA) to collectively describe the assumption of noninterference and the existence of only a single version of the treatment. Positivity means that every person in the sample has a probability of greater than 0 and smaller than 1 of receiving the exposure. In the context of life-event studies, this implies that every individual needs to have some nonzero probability of both potentially experiencing and potentially not experiencing the focal life event. This assumption would, for example, be violated in studies on the effects of the birth of the first child if there are participants who are known to be infertile. If the positivity assumption is violated, researchers need to redefine the target population. An approach to empirically examine whether the positivity assumption is met is to inspect whether the distributions of the measured covariates for the individuals in the event and comparison groups have overlapping regions of support. Graphically, positivity implies that the groups can be compared only over the region(s) in which measured values of each covariate occur in both the group exposed and not exposed to the life event. Finally, exchangeability, undoubtedly the most prominent concern in causal analyses, means that the conditional probability of receiving the exposure depends only on measured covariates. This assumption is also termed unconfoundedness (Imbens & Rubin, 2015) or ignorability (Rosenbaum & Rubin, 1983) and implies for life-event studies that, conditional on a set of covariates, it is random whether individuals experience the focal life event. In the absence of exchangeability, observed differences between the event group and the comparison group may be confounded by factors unrelated to the life event. For instance, parents and nonparents might differ in their career focus and consequently, income levels (see Fig. 1) so that differences in life satisfaction observed between these groups after childbirth might not accurately reflect the causal effects of having a first child but rather be attributed to the other preexisting differences. To make the exchangeability condition plausible, researchers need to make assumptions about the causal systems in their study to identify confounding variables that have to be controlled in the analysis. These assumptions are based on expert knowledge, prior empirical research, and theoretical reasoning but cannot be empirically tested.

Fig. 1.

Hypothetical directed acyclic graph for the effects of birth of first child on life satisfaction.

Using Directed Acyclic Graphs to Specify the Hypothesized Causal Structure

The hypothesized causal structure can be specified using directed acyclic graphs (DAGs; for a nontechnical introduction, see Rohrer, 2018). A DAG consists of nodes (observed or unobserved variables) that are connected by arrows. These arrows encode theory, prior empirical research, and assumptions to represent the hypothesized causal relationships between nodes. Typically, DAGs are nonparametric so that no particular functional form (e.g., linear, quadratic) of the causal relationships is assumed. DAGs should include all theoretically relevant variables regardless of whether they were measured in a given study. As an example, Figure 1 depicts a simple hypothetical DAG for the effects of becoming first-time biological parents on life satisfaction.

Based on the hypothesized DAG, researchers can determine which variables need to be controlled for and which variables should not be controlled to identify the causal effect of interest (Cinelli et al., 2024; Wysocki et al., 2022). Of particular relevance is how the different covariates are assumed to be related to (a) the exposure (e.g., birth of first child) and (b) the outcome of interest (e.g., life satisfaction). Three types of covariates can be differentiated: confounders, mediators, and colliders. Confounders are common causes of the exposure and the outcome. In our example DAG, career focus is a common cause of the likelihood of becoming first-time biological parents and of life satisfaction. To identify the causal effects of interest, it is essential to control for confounders. Confounders can be either time-invariant, meaning that their levels and effects do not change over time (e.g., biological sex, ethnicity), or time-varying, meaning that their levels or effects can change over time (e.g., social support, career focus, income).

Mediators are variables that are causally affected by the exposure and that, in turn, have a causal effect on the outcome. In our illustrative DAG (Fig. 1), parental role identity is a mediator because it is causally affected by the birth of first child while it, in turn, causally affects life satisfaction. Optimally, mediators are assessed after the exposure and before the outcome is assessed (see Rosenbaum, 1984).² Colliders are variables that are caused by both the exposure and the outcome. In our example, relationship satisfaction after the birth of the first child is a collider because it is causally affected by both becoming a first-time parent and life satisfaction. In contrast to confounders, mediator and collider variables should not be controlled for because they are causally affected by the exposure itself; they are posttreatment variables (Elwert & Winship, 2014; Pearl, 2009; Rohrer, 2018; Rosenbaum, 1984; Rubin, 1974).

In longitudinal studies, DAGs quickly become complex because the same construct can play different roles in the same causal chain depending on its (temporal) location. For example, prebirth relationship satisfaction could be both a positive cause of the likelihood of becoming first-time parents and of postbirth life satisfaction, making it a confounder. In contrast, postbirth relationship satisfaction may be changed by the birth of the first child and, in turn, affect life satisfaction, making it a mediator. Thus, the timing of the measurements needs to be considered when identifying variables that need to be controlled for. Another challenge arises when an earlier life event affects the occurrence of a later life event (e.g., marriage increases the probability of the birth of the first child) and researchers want to examine the effects of both events in a single analysis. In these cases, the same posttreatment variable can serve (a) as a confounder on one path in the DAG that should be controlled for (b) and as a collider on another path that should not be controlled for. For these situations, Robins (1986) developed an approach known as parametric g-estimation that partitions the analysis into a series of separate regression equations that decouple the covariate adjustment processes and produce unbiased estimates of the casual effect of each exposure. When there are several waves of measurement, the g-estimation procedure can be augmented with an approach known as marginal structural modeling (MSM) that concisely summarizes the estimates of causal effects of each of the exposures. Loh et al. (2024) presented a tutorial that explains parametric g-estimation and MSM, including lavaan syntax to implement these procedures.

In sum, specifying DAGs in longitudinal studies can be challenging. Yet it is an integral part of the research process to make the exchangeability assumption plausible. Furthermore, the process of specifying the DAG makes the assumed causal structures and limitations of a given study transparent. Finally, the DAG informs future research about which confounders should be measured in order to estimate the effects of interest.

Step 3: Estimating the Causal Effect

After the causal effect of interest has been identified, it can be estimated using empirical data. In theory, unbiased estimates of a causally identified effect can be obtained if the following two conditions are met (Hernán & Robins, 2024). First, all variables included in the analysis are reliably measured so that measurement error does not bias the estimates. Second, the parametric models are correctly specified. This means that the functional form (i.e., linear, polynomial, spline; see Suk et al., 2019) for the relationships between variables needs to be correctly specified. In practice, there are two main strategies for estimating causal effects of life events: difference-in-difference designs that use a comparison group to separate event-related changes from normative changes (e.g., Lawes et al., 2023; Yap et al., 2012) and within-person designs that control for all time-invariant person-level confounders (e.g., Clark et al., 2008; Krämer et al., 2024). We describe below the main features and most frequent challenges of these two modeling approaches. To address the two key statistical conditions for estimating causally identified effects described above (i.e., no measurement error, no model misspecification), we also discuss latent-variable models and methods of representing different functional forms.

Difference-in-Difference Designs

Difference-in-difference designs (for a review, see Wing et al., 2018) can be used to distinguish event-related changes from changes that would occur irrespective of the life event (e.g., normative changes or aging). In prototypical life-event studies, difference-in-difference designs compare individuals who experience the focal life event (i.e., the event group) with individuals who do not experience it (i.e., the comparison group). Causal effects are estimated based on group differences in within-person changes, meaning that time-invariant confounders that affect only the outcome levels in both groups do not confound the effect estimates. However, confounders that have group-specific effects on outcome changes over time can introduce bias.

The central assumption of difference-in-difference designs is that the changes in the comparison group equal the counterfactual changes of individuals in the event group if the event group (contrary to fact) had not experienced the life event. This assumption is often called the common-trends assumption and implies that the trajectories of the individuals in the event and comparison groups would be parallel if the effects of the life event had been removed. This assumption further implies that in the absence of the life event, the differences between the event and comparison groups would be constant over time. In life-event studies, the plausibility of this common-trends assumption can generally not be empirically tested but, rather, needs to be evaluated based on prior empirical research and theory.³ The common-trends assumption is more likely to hold if the participants in the event and comparison groups are exchangeable before the life event (Hernán & Robins, 2024; Wing et al., 2018). However, this condition is rarely met in observational data, especially if the experience of the life event is at least partly endogenous (e.g., the birth of a child). Often, individuals in the event and comparison groups differ with respect to relevant covariates that may confound the estimate of the causal effect. Thus, procedures that balance the event and comparison groups on measured covariates are generally needed so that the groups become exchangeable (Hernán & Robins, 2024).

Creating covariate balance between the event and comparison groups

The first step to achieve balance between the event and comparison groups is to identify and measure all baseline covariates that are potential confounders. DAGs facilitate this process. In general, researchers should aim to balance the groups with respect to the last measure before the first (anticipatory) effects of the life event occurred. Researchers can then choose between three main approaches to achieve balance: matching, weighting, and covariate adjustment in outcome regression (for an overview, see Schafer & Kang, 2008). These approaches aim at optimizing the trade-off between internal validity (i.e., ruling out the confounding by creating event and comparison groups whose covariate distributions are highly similar), precision (i.e., obtaining a large sample size for estimating the effect with minimal standard error), and external validity (i.e., estimating effects that can be generalized to the target population; Greifer, 2020). Current research suggests that the choice of covariates used for balancing plays a much more important role for estimating unbiased causal effects than the method used to create balance (Cook et al., 2009; Pohl et al., 2009).

Matching

Matching achieves covariate balance by identifying individuals in the event and comparison groups who are highly similar to each other on the set of identified confounders. Most existing life-event studies have used the propensity score in their matching algorithms (e.g., Buecker et al., 2021; Golle et al., 2019; P. L. Hill et al., 2021; Jackson et al., 2012; Krämer & Rodgers, 2020; Lawes et al., 2023). The propensity score is defined as the probability of treatment exposure (e.g., experiencing the life event) given a set of observed baseline covariates (for an accessible introduction to propensity-score methods, see West et al., 2014). Most often, the propensity scores are estimated using logistic-regression models; however, alternative machine-learning methods may also be used (for an overview, see Westreich et al., 2010). The propensity score (if correctly estimated) has a highly valuable property: Conditioning on the propensity score will balance all measured covariates used to construct the propensity score between the event and comparison groups (Austin, 2011). Matching individuals from the event and comparison groups with highly similar propensity scores is expected to produce unbiased treatment effects. Algorithms for a variety of different matching approaches are implemented in many software packages (e.g., MatchIt in R; see Ho et al., 2011). Trustworthy standard errors for the effect estimates can be obtained by using cluster-robust standard errors or bootstrapping (Abadie & Spiess, 2022).

Matching approaches have many valuable features: They make it easy to inspect whether each of the covariates is balanced between the event and comparison groups, they separate the covariate-balancing step from the effect-estimation step, and the effect estimation does not rely on functional form assumptions. However, matching approaches also have several drawbacks. Most importantly, matching often involves discarding individuals with poor matches from the analysis so that the effect of the exposure is not estimated based on the full sample. Specifically, in most matching applications, some individuals from the comparison group will be discarded because they are too different from the individuals in the event group. Furthermore, individuals from the event group may also be discarded from the effect estimation, for example, when a caliper is specified (i.e., calipers restrict the differences on propensity scores between matched individuals to be below a certain threshold) or matches are restricted to the region of common support (i.e., the region in which the covariate distributions of the control and event groups overlap). This exclusion of individuals from the analysis sample has two consequences. First, the sample size is reduced, resulting in less precision (i.e., larger standard errors, lower statistical power) in the estimate of the causal effect. Second, depending on which individuals are discarded, the analysis may not estimate the ATE but, rather, another estimand that might not generalize to the population from which the sample was drawn. Indeed, it may even be unclear whether the effects estimated in the matched sample generalize to any broader population (Greifer & Stuart, 2023). Finally, matching is unable to control for unmeasured confounders and selective attrition.

Weighting

Covariate balance between the event and comparison groups can also be achieved by weighting the observations in a way that creates a pseudopopulation in which the confounding variables are not associated with the likelihood of exposure to the event (Hernán & Robins, 2024). If weighting is successful, unbiased causal effects can be estimated based on the weighted sample (e.g., using weighted linear regression). The most common weighting method is inverse probability of treatment weighting, which is based on the propensity score (for an introduction, see Thoemmes & Ong, 2016). However, numerous other weighting methods have been proposed (see e.g., Chan et al., 2016; Hainmueller, 2012; Huling & Mak, 2022; Wang & Zubizarreta, 2019); most of these methods are implemented in software packages (e.g., WeightIt in R; see Greifer, 2023). Nonparametric bootstrapping or robust variance estimators can be used to compute the standard errors of the estimated causal effects after weighting (Austin & Stuart, 2015; Hernán & Robins, 2024).

Weighting approaches share many desirable features with matching: Balance of each covariate can be easily investigated by inspecting the covariate distributions in the weighted samples, the covariate-balancing step is separated from the effect estimation, and the effect estimation does not rely on any functional form assumptions. Weighting approaches also have their own unique advantages. Weighting is less likely to require discarding individuals so that the causal effects can be estimated with higher statistical power (Desai & Franklin, 2019). Furthermore, when no individuals are discarded, weighting allows the estimation of causal effects that can be generalized to different target populations (Greifer & Stuart, 2023). Finally, the balancing weights can be combined with survey weights (Dong et al., 2020) and censoring weights (Cole & Hernan, 2008; Hernán & Robins, 2024, Chapter 12.6) to minimize bias because of selective participation and selective dropout, respectively. However, weighting approaches also have important limitations. First, weighting approaches may result in extreme weights for some individuals, in which case, causal-effect estimates are primarily driven by those individuals with large weights. To avoid this problem, researchers can either use a different method of estimating the weights or trim the weights if they are more extreme than a certain threshold (e.g., the first and 99th percentiles; Austin & Stuart, 2015). Second, even though weighting tends to yield more statistical power than matching, the effects will still be estimated with less precision compared with the unweighted sample (i.e., lower effective sample size; Shook-Sa & Hudgens, 2022). In other words, weighting approaches trade off the loss of some statistical power to obtain unbiased estimates of the causal effect that can be generalized to a specific target population. Finally, like matching, weighting approaches do not control for unmeasured confounding variables.

Covariate adjustment in outcome regression

Measured covariates can also be included in parametric-regression models used to estimate the effects of the life event on the outcome variable. One important advantage of this approach is that these models can account for the measurement error in the confounders and the outcome (Mayer et al., 2016; Sengewald & Mayer, 2024).⁴ A second advantage is that no individuals are excluded from the effect estimation so that various (conditional) causal effects can be estimated with high precision. However, outcome regression also has disadvantages. First, researchers have to make the correct assumptions about the functional form of the relationship between the covariates, the exposure, and the outcome variable to obtain unbiased estimates of the causal effects. Functional forms can be challenging to specify, especially when many covariates are included in the outcome regression. In addition, outcome-regression models extrapolate the parametric model, potentially leading to implausible extrapolation to domains beyond the range of the observed data. Furthermore, outcome regression is a one-step approach: The covariate-balance step is not separated from the effect-estimation step. Thus, balance in the covariate distributions between the event and comparison groups cannot be easily inspected, and there is greater risk that researchers might (unknowingly) select an outcome-regression model that yields the desired effects, limiting the objectivity of a study (Rubin, 2007). Finally, as with the other balancing approaches, outcome regression can control only for measured confounders but not for unmeasured confounders.

How to choose an appropriate balancing method

The choice of balancing method should primarily depend on the analysis goal. For example, if researchers are interested in the ATE in the population from which the sample was drawn, they should not use matching approaches because these typically involve discarding unmatched individuals from the estimation of the effect. In our opinion, weighting methods are the most versatile and would generally be our preferred choice. Furthermore, matching or weighting can be combined with outcome-regression methods in doubly robust balancing methods that aim at combining the advantages of two balancing methods. For example, matching and outcome regression can be combined by including the covariates that are still insufficiently balanced after propensity-score matching in a regression model fitted to the matched sample. The advantage of doubly robust methods is that it is sufficient for only one of the balancing models to be correct to estimate causal effects without bias (Funk et al., 2011; Kang & Schafer, 2007).

Analysis models

Once the event and comparison groups have been balanced with respect to all relevant confounders, difference-in-difference estimation entails estimating the causal effects by comparing the event group and the comparison group regarding their changes in the outcome(s). To specify the trajectories of the event and comparison groups, researchers have to make assumptions about the timing and the shape of the changes that occur over time in each group.

Modeling the trajectory of the event group

In life-event research, change patterns are often nonlinear and discontinuous because life events tend to have stronger effects shortly after the event and weaker effects the further a life event is in the past (Baltes et al., 1980; Luhmann et al., 2014). For example, the birth of the first child probably has large detrimental effects on environmental mastery in the weeks after birth but substantially smaller effects several years later. To model such nonlinear and discontinuous changes, distinct phases (e.g., preevent and postevent phases) with phase-specific trajectories can be defined (for details, see Hosoya et al., 2020; Luhmann & Eid, 2013; Singer & Willett, 2003). Such phase-specific trajectories can be estimated using multilevel approaches with piecewise regression or spline models (e.g., Denissen et al., 2019; Krämer & Rodgers, 2020; Luhmann & Eid, 2009; Suk et al., 2019). Figure 2 depicts three examples of models that differ in the number of phases (two vs. five phases) and/or the parametric functional form of the phase-specific trajectories (linear vs. quadratic). To control for measurement error, multiple-indicator models can be used in multi-level structural equation models (SEMs). Alternatively, single-level SEMs can also be specified that are equivalent to these multilevel specifications (e.g., Castro-Alvarez, Tendeiro, Meijer, & Bringmann, 2022). Single-level SEMs tend to be more flexible than their multilevel counterparts and allow for testing of the central assumptions of the analysis (e.g., measurement invariance over time). However, single-level SEMs also tend to be more tedious to specify. The supplementary materials of Dugan et al. (2024), who examined changes in personality-development trajectories around life events (without an explicitly causal focus), further provide open-source code and data for specifying single-level SEMs to model phase-specific trajectories using the latent-growth-curve framework (see McArdle, 2009).

Fig. 2.

Examples of piecewise regression models. (a) A model with two phase-specific linear trajectories. (b) A model with five phase-specific linear trajectories in which an intercept shift is modeled only for the transition from the “anticipation phase” to the “short-term effects phase.” (c) A model with two phase-specific quadratic trajectories.

Modeling the trajectory of the comparison group and effect estimation

In life-event research, two approaches to modeling the trajectory of the comparison group can be differentiated. The first approach (see Table 1, first model) is to model the general shape of the trajectory of the comparison group (dotted line) analogously to the event group (solid line). This approach assumes that the general shape of the phase-specific trajectories applies to all individuals regardless of whether the life event is experienced. A central challenge of this approach is establishing a clear counterfactual because it is unknown when the comparison group would have experienced the life event of interest (e.g., when childless women in the comparison group would have given birth). As a consequence, it is not clear how to temporally align the measurement occasions of the comparison group to those of the event group. One way to solve this issue is to first match individuals from the event group to individuals from the comparison group and then use the measurement occasion in which the individuals in the event group experienced the life event as the “artificial time” for when their matched counterparts in the comparison group would have experienced the life event. For detailed analysis scripts for this approach, see, for example, the supplementary materials of Krämer and Rodgers (2020) and van Scheppingen and Leopold (2020). After the trajectories of the event and comparison groups have been temporally aligned, the causal effects can be estimated by interacting an exposure indicator (i.e., event group vs. comparison group) with the parameters specifying the phase-specific trajectories (e.g., intercept, linear slope; for example analysis code, see Krämer & Rodgers, 2020). Alternatively, multigroup SEMs can be applied to test whether the phase-specific trajectories of the control and event groups differ from each other (for example analysis code, see van Scheppingen & Leopold, 2020).

Table 1.

Overview of Modeling Approaches

Analysis model	Challenges	Counterfactual	Control for confounders	Analytical models	Examples with analysis code
Difference-in-difference design: comparison group with trajectory	Temporal alignment of groups + modeling the trajectory for event group (number of phases, functional form)	Trajectory of comparison groupAssumption: Trajectory of comparison group resembles the trajectory of individuals in the event group if they had not experienced the event.	Balancing of the groups through matching, weighting, or the inclusion of covariates	Piecewise multilevel models with interactions between event indicator and parameters specifying the trajectories or multigroup SEMs	Krämer and Rodgers (2020), van Scheppingen and Leopold (2020)
Difference-in-difference design: constant comparison group (“iron bar” model)	Modeling the trajectory for event group (number of phases, functional form)	Model intercept (i.e., overall mean in the occasions in which all phase-specific dummy variables are zero)Assumption: All changes in the event group are caused by the event.	Balancing of the groups through matching, weighting, or the inclusion of covariates	Piecewise multilevel models or multigroup SEMs	Denissen et al. (2019), Reitz et al. (2022)
Within-person model	Selection of reference period to compute the reference level	Person-specific intercepts (i.e., person-mean in baseline period)Assumption: All deviations from a person’s baseline level are caused by the event.	Time-invariant confounders are controlled by design. Time-varying confounders need to be included in the model.	Fixed-effects regression or multilevel models with person-mean-centered predictors	Clark et al. (2008), Lawes et al. (2023), Krämer et al. (2024)

Note: SEMs = structural equation models. ^aFor the two difference-in-difference designs, we depicted piecewise models with a linear preevent and postevent slope. For the within-models design, we depicted a model with occasion-specific effects. Although these are popular choices, other specifications are, of course, possible (see also Fig. 2).

The second approach (Table 1, second model) is to not model a trajectory for the comparison group at all (dotted line). The assumption underlying this so-called “iron bar” model is that the outcome levels are stable in the absence of exposure to the life event (e.g., childless comparison individuals do not experience changes in life satisfaction over time). This approach can be implemented by setting the indicator (dummy) variables referencing the various time phases for individuals in the comparison group to zero. The causal effects in these models then correspond to the coefficients of the dummy variables referencing the various time phases for individuals following the life event in the event group. Each of the dummy variables estimates the mean difference between the time-invariant intercept in the comparison group and the estimated level of the trajectory at the beginning of each time phase for the event group. Because it is unlikely that the outcome levels in the comparison group are completely stable over time, additional covariates can be included to adjust the causal effect for normative changes that occur regardless of the life event. For example, by adding age as a covariate, one can control for normative age-related changes. For detailed analysis scripts for this approach, see, for example, the supplementary materials of Reitz et al. (2022) and Denissen et al. (2019).

Both approaches rely on strong assumptions. The first approach heavily relies on the assumption that the artificial time in which individuals in the comparison group would have experienced the life event is correctly specified and that the trajectories of the event and comparison groups have the same functional form. The second approach relies on the assumption that all changes occurring in the comparison group can be adequately modeled through the inclusion of covariates. However, whether these conditions are actually met in a given study cannot be tested.

Limitations of difference-in-difference designs

Even though difference-in-difference designs attempt to mimic randomized experiments and are viewed as relatively strong designs for estimating causal effects in observational studies (Rubin, 2007), they also have important drawbacks for the study of life events. For one, developing a clear definition of the event and comparison groups becomes difficult when life events are reversible (e.g., marriage → divorce; job loss → reemployment). Consider a study that examines the effects of job loss over an extended time period. If the event group is defined based on whether a person experienced a transition from employment to unemployment, the event group will contain subgroups of individuals who become long-term unemployed and individuals who find a new job quickly. Although it may be possible to estimate the short-term effects of a job loss with this event group, estimation of longer-term effects will conflate the results of adaptation processes in the long-term unemployed group with the (positive) effects of reemployment in the latter group. One approach to this problem is to define different event subgroups depending on how long individuals stay unemployed (e.g., Event Group 1: reemployed after 1 month of unemployment; Event Group 2: reemployed after 2 months of unemployment). This approach, however, may become infeasible because of low sample sizes in the different event subgroups. Another, more straightforward approach is to use g-estimation (Loh et al., 2024; Robins, 1986), briefly described in the above section “Using Directed Acyclic Graphs to Specify the Hypothesized Causal Structure.” G-estimation allows for the examination of the effects of different sequences of reversible life events (e.g., employed → unemployed → unemployed → employed). A further limitation of difference-in-difference designs is that all confounders that are related to changes in the outcomes over time need to be measured and adequately accounted for. Within-person designs, to which we now turn, can (at least in part) address the latter of these limitations because they allow researchers to account for all time-invariant confounders regardless of whether they are measured or unmeasured.

Within-Person Designs

Within-person designs use each individual as his or her own control. Estimates of causal effects in within-person designs are obtained by examining how strongly individuals deviate from their “baseline levels” before and after the exposure (see Table 1, third model). By focusing on within-person changes, these models can rule out confounding resulting from any observed or unobserved time-invariant covariates so that only time-varying confounders need to be controlled for (Allison, 2009; Kim & Steiner, 2021; Rohrer & Murayama, 2023; Wysocki et al., 2022).

A key challenge in within-person models is to define the baseline level of each individual. This baseline level should represent the outcome level in the absence of the exposure. In life-event studies, the baseline level is often defined as the mean outcome level of a person across all observations for which it is plausible that the life event has not (yet) affected the outcome measure. For example, in studies examining the effects of the birth of a first child, it may be sensible to define the baseline level as the mean of all observations collected at least 9 months before the birth. In practice, the data are usually organized in a long format (i.e., one row per measurement occasion per individual), and occasion-specific dummy variables are created for each of the focal occasions relative to the occurrence of the life event (e.g., the first, second, or third observations after the child is born). The baseline level then corresponds to the mean outcome levels across all nonfocal observations, that is, those for which each of the occasion-specific dummy variables is 0 (for example analysis code, see Krämer et al., 2024). Individuals who were not exposed to the event are often also included in within-person models with the justification that (a) the baseline levels (model intercept) and (b) the regression coefficients for time-varying covariates can be estimated with higher precision. To implement this approach, the occasion-specific dummy variables are set to 0 for all individuals who were not exposed to the event. However, this approach assumes that regression coefficients are the same for individuals who experience the life event and individuals who do not. Because this assumption is often unrealistic, we advise against this practice.

For reversible life events (e.g., job loss), the reference period is often defined as all occasions in which individuals are not observed in the new status resulting from the life event (e.g., all occasions during which the individual is employed). However, this approach relies on the assumption that the focal life event does not have any persisting effects on the outcome after it has been reversed. Yet, for example, negative effects of unemployment on life satisfaction have sometimes persisted when individuals regain employment (“scarring effect”; Clark et al., 2001; Hetschko et al., 2019; for a contrasting finding, see Zhou et al., 2019). In such cases, postevent measurement occasions should not be included in the definition of the baseline levels to maintain a clear separation between the preexposure baseline level and the event’s effects.

Analysis models

There are two general approaches that may be used to estimate within-person causal effects: multilevel modeling with person-mean-centered predictors (e.g., Hoffman, 2019; Raudenbush & Bryk, 2002) and fixed-effects models (Allison, 2009). Although these two modeling strategies have different mechanics and are rooted in different research traditions, they both aim at estimating the same within-person effect (Hamaker & Muthén, 2019). Both approaches are implemented in various statistical packages, such as R, SAS, or STATA.

Multilevel models with person-mean centering

In the psychological literature, multilevel models with person-mean-centered predictors are generally used to simultaneously examine within-person changes and between-person differences (e.g., Hoffman, 2019; Raudenbush & Bryk, 2002). These flexible models allow for the inclusion of time-invariant person-characteristics (stable traits; e.g., biological sex) to explain between-person differences. Moreover, they aim at making population-based inferences by treating the observed individuals as a random sample from the target population and assuming a specific distribution (typically normal) of the between-person effects across individuals in the population (McNeish, 2023). When multiple indicators of the outcome variable are available, the influence of measurement error can also be addressed. Castro-Alvarez, Tendeiro, de Jonge, et al. (2022) proposed a particularly relevant model for this purpose: the mixed-effects trait-state-occasion (ME-TSO) model, which is rooted in latent-state-trait theory (Steyer et al., 1992, 1999, 2015). The ME-TSO model can be used to contrast the true (i.e., free of measurement error) trait-outcome levels before and after a life event occurred (for example analysis code, see Lawes et al., 2024).

Fixed-effects models

In economics, sociology, and political science, fixed-effects models (Allison, 2009) are often applied to estimate within-person causal effects. Fixed-effects models account for all stable differences between individuals (Angrist & Pischke, 2009; Bell & Jones, 2015; Wooldridge, 2013).⁵ The goal of this approach is to remove all between-person variation so that only within-person variation remains. An advantage of this approach is that no assumptions are made about the distribution of the between-person effects in the population. However, this approach comes at a cost of being unable to quantify between-person differences. Fixed-effects models also aim at making inferences for the specific collection of individuals in the sample rather than in the target population (McNeish, 2023; McNeish & Kelley, 2019). Furthermore, to our knowledge, fixed-effects models cannot address the issue of measurement error (T. D. Hill et al., 2020). Given these limitations, fixed-effects models are less suited for life-event studies, which are typically interested in (a) quantifying between-person differences, (b) estimating causal effects that generalize to broader target populations, and (c) controlling for measurement error.

Limitations of within-person models

The results of within-person designs can be biased by time-varying covariates (e.g., historical events) and maturational trends that may differ between the baseline and event periods (Holland, 1986; Shadish et al., 2002) and confound the estimate of the effect of the event. In life-event studies, many covariates are time-varying because either their levels or the magnitude of their effects may change over time. Of particular concern are other life events that co-occur with the focal event (Krämer et al., 2024). For example, the estimate of the effect of the birth of the first child might be confounded with the effects of other life events that occur immediately before the event (e.g., partners moving in together, getting married). Furthermore, characteristics that are typically conceptualized as being stable over time (e.g., residence, personality) might change over time. We therefore advocate that—just like in difference-in-difference models—researchers critically discuss the hypothesized causal structures when they use within-persons models and ideally measure and analytically control for time-varying confounders.

Another important limitation of within-person models is that only individuals who were exposed to the event actually contribute to the estimation of the causal effects. This limitation has two important implications. First, the statistical power to detect the causal effects may be lower than in difference-in-difference designs (Allison, 2009). Second, the causal effects of within-person models apply only to those individuals who were actually exposed to the event. This feature alters the estimand to the ATT, which is generally different than the ATE, which generalizes to the population from which the sample was drawn (Collischon & Eberl, 2020; T. D. Hill et al., 2020).

Recommendations for Deciding Between Difference-in-Difference Models and Within-Person Models

Whether difference-in-difference or within-person models should be used primarily depends on the assumed causal structure and the desired causal estimand. Figure 3 depicts a flowchart that provides a guide for choosing the appropriate analysis model in different contexts. The flowchart also shows that it is not possible to estimate unbiased causal effects if there are (a) unmeasured time-varying confounders or (b) unmeasured time-invariant confounders and the desired causal estimand is not the ATT.

Fig. 3.

Guide for choosing between difference-in-difference and within-person models. ^aAs described in the section on matching, it is generally not appropriate to use matching when the target estimand is the average treatment effect (ATE) for the population from which the sample was drawn.

Step 4: Probing Credibility, Robustness, and Generalizability of Effects

The process of establishing the most credible causal inference in observational life-event studies does not end once the causal effect is estimated. After estimating the causal effects of interest, it is essential that researchers (a) probe whether their theoretical assumptions are plausible, (b) examine the robustness of the results when alternative analysis methods are used, and (c) critically discuss the generalizability of the results.

Plausibility of the Assumptions

As a first step, researchers should critically check whether they did everything to make the key assumptions of their analytical strategy plausible. Of particular importance is the assumption of exchangeability. Recall that this assumption states that, conditional on the set of controlled confounders, it is random whether an individual is exposed to the event. In life-event studies, there are two main issues that challenge this assumption. First, there are often strong selection effects, meaning that individuals who experience versus not experience the focal life event may differ considerably. Even after controlling for measured covariates (e.g., through covariate balancing), selection effects resulting from unobserved confounders potentially remain. Second, the true causal structure of life-event studies is often highly complex, and the relevant psychological theories are often incomplete. Thus, it is generally impossible to be sure that the identified confounders comprise a sufficient set to identify the causal effect. Accordingly, full (conditional) exchangeability will only rarely hold in life-event studies. Below, we present two indirect methods to probe the plausibility of the exchangeability assumption in studies using difference-in-difference designs.

Design elements and pattern matching

Shadish et al. (2002) and Rosenbaum (2020) emphasized the importance of including so-called design-element controls to strengthen causal inferences from observational studies. Design elements are used to capture the effect of plausible threats to internal validity in the absence of an actual treatment effect. For example, multiple pretests over time may be used to estimate maturational trends before exposure to a life event. The obtained pattern of results predicted from the threat to internal validity (confounding variable) is compared with the pattern of results predicted from the hypothesized effect of the treatment. In life-events research, multiple comparison groups that are differentially sensitive to the confounding variable are often used. These different comparison groups are sometimes termed negative control exposures (Lipsitch et al., 2010). For example, Elwert and Christakis (2008) investigated whether the death of a wife increases the mortality of the husband (the “widowhood effect”). To rule out the possibility that spousal similarity or a shared environment are the true causes of this effect, they used the death of an ex-wife as a negative control exposure.

Another common design element is the inclusion of additional outcome variables that are not expected to be affected by the exposure but would be expected to be affected by other threats to validity (nonequivalent dependent variables; Shadish et al., 2002). These additional outcomes are sometimes called negative control outcomes (Lipsitch et al., 2010). For example, the sustained illness of an English teacher would be expected to have a far greater effect on a measure of students’ performance in English than on a measure of students’ performance in math. However, in life-event studies, it can be difficult to rule out postevent effects on psychological variables. Thus, a sensible choice for negative control outcomes in life-event studies will often be the levels of the focal outcome before the onset of anticipatory effects (see also Imbens & Rubin, 2015). Specifically, researchers can test whether the event and comparison groups differ in terms of their preevent outcome levels conditional on all other identified confounders. If there are indeed such differences, the assumption of exchangeability becomes less plausible (Imbens, 2004; Imbens & Rubin, 2015).

Sensitivity analysis

A second method to examine the plausibility of the exchangeability assumption is sensitivity analysis (Rosenbaum, 1986, 2017). The basic idea of sensitivity analysis is to examine how large the effect size of an unmeasured confounder would need to be to alter the study conclusions, either making the effect no longer statistically significant or making the magnitude of the effect no longer of practical interest (Rosenbaum, 1986, 2017). For example, Haviland et al. (2007) investigated the effects of adolescent boys joining a youth gang (i.e., exposure) on subsequent self-reported violence (i.e., outcome). Their sensitivity analysis showed that it would require an unmeasured confounder that led to an increase in the odds of joining a gang of more than 50% to reduce the effect of gang membership on violence so that it would no longer be statistically significant.

In a related approach, researchers can use measured covariates to gauge the possibility that unmeasured confounders could potentially alter the study’s conclusions. For example, the covariate with the largest standardized mean difference can be used as a benchmark for a large imbalance on an unmeasured covariate between the event and comparison groups. Furthermore the largest correlation between any of the covariates and the outcome can be taken as the benchmark for a large relationship. Researchers can then compute how the original estimate of the causal effect would be reduced if an unmeasured confounder existed with the properties of the derived benchmarks. Although there is no theoretical proof of the absence of a stronger unmeasured confounding variable, this procedure provides a plausible empirically based upper limit for the effect of an unmeasured confounder (Hong, 2004; Rosenbaum, 1986). If even the presence of this large hypothetical confounder would not alter the conclusion of the study, it becomes more plausible that the causal effect is credible.

Robustness of the Analysis

Researchers should also test whether their causal effects are robust when different analytical methods to estimate them are used. For example, Lawes et al. (2023) found highly similar effects of unemployment on well-being in (a) an unmatched analysis, (b) a propensity-score-matched analysis, and (c) an analysis that included key potential confounders as covariates in the outcome-regression model. Optimally, a robustness check would involve comparing all potentially appropriate analytical approaches to each other. This can, for example, be achieved in a specification-curve analysis (Simonsohn et al., 2020). Of importance, only those analytic methods that estimate the same causal estimand should be considered in such comparisons. When analytic methods that yield different causal estimands are compared with each other, it is unclear whether potential differences in the effect estimates result from a lack of robustness or from differences in the causal estimands. For example, when the causal estimand is the ATE, matching procedures that estimate the ATT should not be included in the robustness check.

Generalizability, Effect Heterogeneity, and Effect Moderation

So far in this article, we have primarily focused on how to identify and estimate average causal effects in longitudinal life-event studies. However, life-event researchers have stressed that people generally have heterogeneous reactions to life events (e.g., Bleidorn et al., 2020; Luhmann et al., 2021). For example, there are strong individual differences in how unemployment affects well-being (e.g., Lawes et al., 2024): For most individuals, becoming unemployed negatively affects life satisfaction; yet some individuals also report higher life satisfaction during unemployment. An important step in life-event research is to document this heterogeneity of responses to different life events on various outcomes and to understand its sources. Future research needs to search for individual-difference variables that lead to different magnitudes or even different directions of the causal effects of life events. These effect modifiers may be categorical (biological sex, marital status) or numeric (e.g., age, years employed) measured variables. Or they may be latent continuous variables (e.g., socioeconomic status created from measures of education, income, and occupation) or latent groups created from measured variables. To identify homogeneous latent groups with similar response trajectories to life events, researchers have proposed latent-trajectory analysis (e.g., Galatzer-Levy et al., 2010, 2011; Infurna & Luthar, 2017). This method can determine the proportion of individuals who exhibit different patterns of change after the event (e.g., declines, increases, no change). If properly specified, latent-trajectory models provide a reliable summary of the different patterns of change based on multiple measurement waves.

An important issue with effect moderation is the theoretical distinction between causal moderation, which implies that intervening on the moderator variable would cause changes in the effects, and noncausal moderation, which implies that the moderators are related but not causally related to heterogeneity in the effects of life events (see Hernán & Robins, 2024; VanderWeele, 2015). To date in life-event research, nearly all moderation analyses have been noncausal. Thus, it is important to carefully report and interpret these moderation analyses so that readers do not misinterpret noncausal moderation effects as causal (Rohrer et al., 2022).

The issues of effect heterogeneity and effect moderation further provide a segue into another important topic that often receives less attention in causal analyses: the external validity of study results (Diener et al., 2022; Rothwell, 2005). To directly generalize the estimated causal effects to the target population, a random sample of the defined target population should be drawn. In practice, random samples can, however, never be achieved in life-events studies; for example, because not all individuals selected in the sample will agree to participate in the study. Thus, the obtained causal effect in most life-event studies will not directly generalize to the population from which the individuals were drawn. When information on the characteristics of the population are available (e.g., through administrative records), methods such as the inclusion of sampling weights (Winship & Radbill, 1994) may still permit the researcher to estimate effects that are directly generalizable to the target population of interest. It might even be possible to provide an estimate of the causal effect in a different target population, a concept known as transportability. For example, one might attempt to transport the results of a sample that was collected in one country to the target population of another country. For more details on generalizability and transportability as well as references to computer code for implementing these procedures, see Box 3.

Box 3.

Generalizability and Transportability of the Results

Whenever the distribution of effect modifiers differs in the achieved sample and the target population, the estimate of the causal effect will be biased, possibly by a substantial amount. Statistical corrections of the causal effect obtained in the sample have been developed to properly estimate the causal effect in the target population (for a review of these methods, see Degtiar & Rose, 2023). These corrections require measurement of all potential effect modifiers in both the sample and the population (or information about their distributions). The developed correction methods rely on assumptions that are highly similar to those described in the section “Step 2: Identifying the Causal Effect.” The difference is that here, the assumptions apply to the achieved sample and the population rather than the event and comparison groups (Degtiar & Rose, 2023):1. Conditional on the effect modifiers, there must be exchangeability of the individuals in the sample and the target population (i.e., exchangeability).2. For values of all effect modifiers, given the individual is in the population, the probability that the individual is in the sample is greater than 0 (i.e., positivity).3. The measurement and nature of the life event and the measurement of the outcome need to be identical in the sample and the population (i.e., consistency).Using data on the potential effect modifiers in both the sample and the population, matching, weighting, and outcome-regression procedures may be used to provide an estimate of the causal effect in the population. For detailed procedures for these analyses, see Dahabreh et al. (2020); R scripts are provided in the article’s supplement for implementation.The same procedures may potentially also be used to provide an estimate of the causal effect in a different target population (e.g., collecting the sample data in one state, province, or country; attempting to transport the results to the target population of another state, province, or country). In these cases, the importance of identifying and measuring the full set of all possible effect modifiers becomes even more critical. Degtiar and Rose (2023) described methods of assessing the similarity of the original and the new target population and identifying potential treatment heterogeneity. Dahabreh et al. (2020) described sensitivity analyses that probe whether potential unmeasured effect modifiers could substantially bias the results and provided R scripts in their supplement to perform these analyses.

Summary: Causal Inference in Life-Events Studies

Approaching life-event research within an explicitly causal framework is challenging. We have presented the four central steps involved in making credible causal inferences in life-event studies: (a) conceptually defining the causal estimand, (b) identifying the causal estimand, (c) estimating the causal effect, and (d) probing the credibility, robustness, and generalizability of the estimated effect. For a checklist summarizing these steps, see Box 4.

Box 4.

Checklist for Steps Involved in Causal Inference for Life-Event Studies

Step 1: Definition of Causal Estimand
Are all four entities of the causal estimand (i.e., causal contrast, outcome, time lag, target population) clearly defined (see Box 1)?	□
Step 2: Identification of Causal Effect
Can consistency be assumed?- Is the life event a well-defined exposure? Is there interference between individuals?	□
Can positivity be assumed?- Do the distributions of the measured covariates for individuals in the event and comparison groups have overlapping regions of support?	□
Can exchangeability be assumed?- Does the directed acyclic graph appropriately depict the hypothesized causal structure?- Are all confounders that need to be controlled correctly identified?	□
Step 3: Estimate Causal Effect
Is the correct class of models chosen to estimate the identified effect (see Fig. 3)?	□
If difference-in-difference models are chosen:- Are all identified confounders sufficiently balanced between the event and comparison groups through matching, weighting, or the inclusion of control variables in the outcome-regression model?- Is the chosen estimation method (e.g., multilevel model or multigroup structural equation model) adequate for the research question? - Is the trajectory of the event group adequately modeled (see Fig. 2)? - Is the trajectory of the comparison group adequately modeled (i.e., same trajectory shape as event group vs. “iron bar” model; see Table 1)?	□
If within-person models are chosen:- Is the baseline period clearly defined and correctly encoded in the data through the inclusion of dummy variables referencing the temporal order of the measurements relative to the event?- Is the chosen estimation method (e.g., fixed-effects models or multilevel models with person-mean centering) adequate for the research question?	□
Step 4: Probe Plausibility, Robustness, and Generalizability of Estimated Effects
Plausibility of the assumptions - Can design elements be included?- Can sensitivity analyses be used to probe how large an unmeasured confounder would need to be to alter the study conclusions?	□
Analytical robustness - Are the results robust if other appropriate analytical approaches are used (e.g., different balancing method or different analysis model)?- Can specification-curve analyses be run with all appropriate models?	□
Generalizability - Is effect heterogeneity sufficiently discussed? Can effect modifiers be identified?- Do the results directly generalize to the target population, or are statistical corrections needed (see Box 3)?	□

Future Directions

Toward More Specialized Observational Studies

Currently, most life-events studies rely on preexisting data from large national panel studies. These panel studies have many valuable features, including large, nationally representative samples and long study durations, but they also have several important limitations. First, they often have rather long time lags between measurement occasions (typically 1 year) so that the temporal resolution is generally too coarse to study shorter-term effects. Second, they generally have a very limited number of outcome measures and relevant covariates are often not assessed. Third, single-item measures or (less reliable) short scales are often used so that the variables of interest cannot be well operationalized. To overcome these shortcomings, additional panel data need to be collected. In an ideal world, intensive longitudinal data would be collected from many individuals over an extended period of time before and after the life event of interest (for a comprehensive discussion about desirable design features, see Bleidorn et al., 2020). However, such intensive studies are generally not feasible because of both budget and time constraints and the high response burden on participants. Participant reactivity to repeated measurements may further alter the causal systems of interest (Rohrer & Murayama, 2023) and decrease the signal-to-noise ratio (Boker & Nesselroade, 2002). Thus, a useful strategy is for researchers to aim at determining the minimum number of measurement occasions needed to appropriately capture the causal effects of interest. For some outcome variables, participant reactivity can further be minimized through the use of modern assessment tools, such as wearable devices and content analysis of social media (Mehl et al., 2024). Knowledge is needed about the timing and functional form of the change processes so that the number and spacing of the measurement is adequate to model the event-related changes in the outcome variables (Collins, 2006; Hopwood et al., 2022; Luhmann et al., 2014). A valuable approach could be to supplement the panel data with measurement-burst designs (Nesselroade, 1991) in which highly frequent assessments occur in the months before and after the life event with less frequent routine assessments otherwise. When it cannot be anticipated when the life event might occur, online tools could be used to ask individuals to indicate when they (expect to) experience the focal life event. Such event-contingent responding permits increased measurement frequency ideally both immediately before and after the life event (for a discussion of event-contingent methods, see Moskowitz & Sadikaj, 2012). In planning the total duration of the study, researchers can be guided by their choice of causal estimand (Rohrer & Murayama, 2023): If the researchers are interested in the immediate effects of the life event, shorter studies may be sufficient, whereas long-term effects require longer-duration studies.

In planning a study, researchers should also consider their measurement instruments. Specifying the hypothesized causal structure a priori can help identify potential confounders that need to be measured to identify the causal effects of interest (Rohrer & Murayama, 2023). Furthermore, self-reported data should ideally be complemented with other data sources (e.g., peer reports, biomarkers, wearable devices, administrative data) to obtain a detailed and differentiated picture of the effects. If possible, the constructs of interest should be measured with multiple indicators, permitting correction for measurement error. To make the inclusion of multiitem questionnaires in panel studies possible, more reliable short scales that are sensitive to change need to be developed and validated.

Natural Experiments

A promising opportunity for particularly credible causal inference in longitudinal observational studies is the increased use of natural experiments (Grosz et al., 2024). Natural experiments are quasi-experiments in which “some key elements of a randomized experiment occur on their own, even though the investigator neither creates nor assigns the treatments” (Rosenbaum, 2017, p. 100). An example of a natural experiment is the German Job Search Panel (GJSP; Hetschko et al., 2022), a study on the effects of unemployment on well-being and health. The GJSP exploited the German job-search registration process, in which employees have to register as jobseekers at least 3 months before they expect to lose their job because of plant closures or mass layoffs. Crucially, only some of the registered jobseekers actually enter unemployment; others manage to either keep their jobs (e.g., because the plant could be saved after all) or immediately start a new job without entering unemployment. By recruiting employed registered jobseekers for a smartphone panel study with monthly measurements, the GJSP permitted comparisons of individuals who entered unemployment (i.e., the event group) with highly similar individuals who remained employed (i.e., the comparison group). This kind of research design can easily be transferred to other life events. For example, the effects of retirement could be studied by collaborating with national pension funds to recruit individuals who are approaching retirement age, newly married couples can be contacted via local marriage bureaus to study the effects of the birth of the first child, or couples can be recruited through marriage counselors to study the effects of divorce. It will likely not be random which of these individuals “at risk” will experience the focal life event; therefore, confounding variables (e.g., expectations to experience the event) need to be assessed and controlled for. Another potential drawback of these natural experiments is that the identification of individuals who are at risk of experiencing the focal life event will be based on a prior event (e.g., registration as jobseeker because of expected job loss, marriage) or a normative process (e.g., reaching retirement age) that triggers recruitment. Thus, estimation of anticipation effects of the life event will often not be possible, and the number of preevent measurements may be limited. We recommend that the findings from such highly controlled natural experiments be combined with those from nationally representative panel studies to obtain credible and generalizable estimates of the effects.

Conclusion

Almost all research questions in life-events studies are inherently causal. Significant advances in research design and statistical analysis for causal inference in observational studies have emerged in fields such as computer science, econometrics, epidemiology, and statistics. Psychology, like other behavioral and health sciences, is just beginning to adopt these advances. In this article, we draw on these developments to present a guide for making credible causal inferences in life-event studies. We are convinced that carefully considering and detailing decisions at each step of the research process is essential: It helps researchers and readers understand a study’s assumptions and limitations, facilitates precise research-question formulation, guides design and analysis choices, and improves the interpretation of estimated effects. We believe that adopting an explicit causal framework for causal research questions will enhance transparency in psychological research, support constructive scientific critique, and ultimately bring the field closer to the “truth” of scientific claims (Campbell, 1988).

Footnotes

Acknowledgements

We thank Claudia Crayen and Ana Tomova for their valuable feedback throughout the writing process. We acknowledge support by the Open Access Publication Fund of Freie Universität Berlin.

Transparency

Action Editor: David A. Sbarra

Editor: David A. Sbarra

Author Contributions

Mario Lawes: Conceptualization; Investigation; Project administration; Visualization; Writing – original draft.

Stephen G. West: Supervision; Writing – review & editing.

Michael Eid: Resources; Supervision; Writing – review & editing.

ORCID iDs

Mario Lawes

Michael Eid

Notes

References

Abadie

Spiess

(2022). Robust post-matching inference. Journal of the American Statistical Association, 117(538), 983–995. https://doi.org/10.1080/01621459.2020.1840383

Allison

(2009). Fixed effects regression models. Sage. https://doi.org/10.4135/9781412993869

Angrist

J. D.

Pischke

J.-S.

(2009). Mostly harmless econometrics: An empiricist’s companion. Princeton University Press.

Anusic

Yap

S. C. Y.

Lucas

R. E.

(2014). Testing set-point theory in a Swiss national sample: Reaction and adaptation to major life events. Social Indicators Research, 119(3), 1265–1288. https://doi.org/10.1007/s11205-013-0541-2

Asselmann

Garthus-Niegel

Knappe

Martini

(2022). Physical and mental health changes in the five years before and five years after childbirth: A population-based panel study in first-time mothers and fathers from Germany. Journal of Affective Disorders, 301, 138–144. https://doi.org/10.1016/j.jad.2022.01.050

Asselmann

Garthus-Niegel

Martini

(2022). Personality and peripartum changes in perceived social support: Findings from two prospective-longitudinal studies in (expectant) mothers and fathers. Frontiers in Psychiatry, 12, Article 814152. https://doi.org/10.3389/fpsyt.2021.814152

Austin

P. C.

(2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. https://doi.org/10.1080/00273171.2011.568786

Austin

P. C.

Stuart

E. A.

(2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661–3679. https://doi.org/10.1002/sim.6607

Baltes

P. B.

Reese

H. W.

Lipsitt

L. P.

(1980). Life-span developmental psychology. Annual Review of Psychology, 31(1), 65–110. https://doi.org/10.1146/annurev.ps.31.020180.000433

10.

Bell

Jones

(2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3(1), 133–153. https://doi.org/10.1017/psrm.2014.7

11.

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B: Methodological, 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

12.

Bleidorn

Hopwood

C. J.

Back

M. D.

Denissen

J. J. A.

Hennecke

Jokela

Kandler

Lucas

R. E.

Luhmann

Orth

Roberts

B. W.

Wagner

Wrzus

Zimmermann

(2020). Longitudinal experience–wide association studies—A framework for studying personality change. European Journal of Personality, 34(3), 285–300. https://doi.org/10.1002/per.2247

13.

Boker

S. M.

Nesselroade

J. R.

(2002). A method for modeling the intrinsic dynamics of intraindividual variability: Recovering the parameters of simulated oscillators in multi-wave panel data. Multivariate Behavioral Research, 37(1), 127–160. https://doi.org/10.1207/S15327906MBR3701_06

14.

Buecker

Denissen

J. J. A.

Luhmann

(2021). A propensity-score matched study of changes in loneliness surrounding major life events. Journal of Personalitiy and Social Psychology, 121(3), 669–690. https://doi.org/10.1037/pspp0000373

15.

Bühler

J. L.

Orth

Bleidorn

Weber

Kretzschmar

Scheling

Hopwood

C. J.

(2024). Life events and personality change: A systematic review and meta-analysis. European Journal of Personality, 38(3), 544–568. https://doi.org/10.1177/08902070231190219

16.

Campbell

D. T.

(1988). Methodology and epistemology for social science: Selected papers ( Overman

E. S.

, ed.). University of Chicago Press.

17.

Castro-Alvarez

Tendeiro

J. N.

de Jonge

Meijer

R. R.

Bringmann

L. F.

(2022). Mixed-effects trait-state-occasion model: Studying the psychometric properties and the person–situation interactions of psychological dynamics. Structural Equation Modeling: A Multidisciplinary Journal, 29(3), 438–451. https://doi.org/10.1080/10705511.2021.1961587

18.

Castro-Alvarez

Tendeiro

J. N.

Meijer

R. R.

Bringmann

L. F.

(2022). Using structural equation modeling to study traits and states in intensive longitudinal data. Psychological Methods, 27(1), 17–43. https://doi.org/10.1037/met0000393

19.

Chan

K. C. G.

Yam

S. C. P.

Zhang

(2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. Journal of the Royal Statistical Society B: Statistical Methodology, 78(3), 673–700. https://doi.org/10.1111/rssb.12129

20.

Chen

F. F.

West

S. G.

(2008). Measuring individualism and collectivism: The importance of considering differential components, reference groups, and measurement invariance. Journal of Research in Personality, 42(2), 259–294. https://doi.org/10.1016/j.jrp.2007.05.006

21.

Chopik

W. J.

Kim

E. S.

Schwaba

Krämer

M. D.

Richter

Smith

(2020). Changes in optimism and pessimism in response to life events: Evidence from three large panel studies. Journal of Research in Personality, 88, Article 103985. https://doi.org/10.1016/j.jrp.2020.103985

22.

Cinelli

Forney

Pearl

(2024). A crash course in good and bad controls. Sociological Methods & Research, 53(3), 1071–1104. https://doi.org/10.1177/00491241221099552

23.

Clark

A. E.

Diener

Georgellis

Lucas

R. E.

(2008). Lags and leads in life satisfaction: A test of the baseline hypothesis. The Economic Journal, 118(529), 222–243.

24.

Clark

A. E.

Georgellis

Sanfey

(2001). Scarring: The psychological impact of past unemployment. Economica, 68(270), 221–241. https://doi.org/10.1111/1468-0335.00243

25.

Cohen

Aiken

L. S.

West

S. G.

(1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34(3), 315–346. https://doi.org/10.1207/S15327906MBR3403_2

26.

Cole

S. R.

Hernan

M. A.

(2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656–664. https://doi.org/10.1093/aje/kwn164

27.

Collins

L. M.

(2006). Analysis of longitudinal data: The integration of theoretical model, temporal design, and statistical model. Annual Review of Psychology, 57(1), 505–528. https://doi.org/10.1146/annurev.psych.57.102904.190146

28.

Collischon

Eberl

(2020). Let’s talk about fixed effects: Let’s talk about all the good things and the bad things. KZfSS Kölner Zeitschrift Für Soziologie Und Sozialpsychologie, 72(2), 289–299. https://doi.org/10.1007/s11577-020-00699-8

29.

Cook

T. D.

Steiner

P. M.

Pohl

(2009). How bias reduction is affected by covariate choice, unreliability, and mode of data analysis: Results from two types of within-study comparisons. Multivariate Behavioral Research, 44(6), 828–847. https://doi.org/10.1080/00273170903333673

30.

Cook

T. D.

Zhu

Klein

Starkey

Thomas

(2020). How much bias results if a quasi-experimental design combines local comparison groups, a pretest outcome measure and other covariates? A within study comparison of preschool effects. Psychological Methods, 25(6), 726–746. https://doi.org/10.1037/met0000260

31.

Dahabreh

I. J.

Bibbins-Domingo

(2024). Causal inference about the effects of interventions from observational studies in medical journals. Journal of the American Medical Association, 331(21), 1845–1853. https://doi.org/10.1001/jama.2024.7741

32.

Dahabreh

I. J.

Robertson

S. E.

Steingrimsson

J. A.

Stuart

E. A.

Hernán

M. A.

(2020). Extending inferences from a randomized trial to a new target population. Statistics in Medicine, 39(14), 1999–2014. https://doi.org/10.1002/sim.8426

33.

Degtiar

Rose

(2023). A review of generalizability and transportability. Annual Review of Statistics and Its Application, 10(1), 501–524. https://doi.org/10.1146/annurev-statistics-042522-103837

34.

Denissen

J. J. A.

Luhmann

Chung

J. M.

Bleidorn

(2019). Transactions between life events and personality traits across the adult lifespan. Journal of Personality and Social Psychology, 116(4), 612–633. https://doi.org/10.1037/pspp0000196

35.

Desai

R. J.

Franklin

J. M.

(2019). Alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: A primer for practitioners. The BMJ, 367, Article l5657. https://doi.org/10.1136/bmj.l5657

36.

Diener

Emmons

R. A.

Larsen

R. J.

Griffin

(1985). The satisfaction with life scale. Journal of Personality Assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13

37.

Diener

Northcott

Zyphur

M. J.

West

S. G.

(2022). Beyond experiments. Perspectives on Psychological Science, 17(4), 1101–1119. https://doi.org/10.1177/17456916211037670

38.

Dong

Stuart

E. A.

Lenis

Quynh Nguyen

(2020). Using propensity score analysis of survey data to estimate population average treatment effects: A case study comparing different methods. Evaluation Review, 44(1), 84–108. https://doi.org/10.1177/0193841X20938497

39.

Dugan

K. A.

Vogt

R. L.

Zheng

Gillath

Deboeck

P. R.

Fraley

R. C.

Briley

D. A.

(2024). Life events sometimes alter the trajectory of personality development: Effect sizes for 25 life events estimated using a large, frequently assessed sample. Journal of Personality, 92(1), 130–146. https://doi.org/10.1111/jopy.12837

40.

Dyrdal

G. M.

Lucas

R. E.

(2013). Reaction and adaptation to the birth of a child: A couple-level analysis. Developmental Psychology, 49(4), 749–761. https://doi.org/10.1037/a0028335

41.

Eid

Diener

(Eds.). (2006). Handbook of multimethod measurement in psychology. American Psychological Association. https://doi.org/10.1037/11383-000

42.

Eid

Geiser

Koch

(2025). Structural equation modeling of multiple rater data. The Guilford Press.

43.

Eid

Hoffmann

(1998). Measuring variability and change with an item response model for polytomous variables. Journal of Educational and Behavioral Statistics, 23(3), 193–215. https://doi.org/10.3102/10769986023003193

44.

Elwert

Christakis

N. A.

(2008). Wives and ex-wives: A new test for homogamy bias in the widowhood effect. Demography, 45(4), 851–873. https://doi.org/10.1353/dem.0.0029

45.

Elwert

Winship

(2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40(1), 31–53. https://doi.org/10.1146/annurev-soc-071913-043455

46.

Funder

D. C.

West

S. G.

(1993). Viewpoints on personality: Consensus, self-other agreement and accuracy in judgments of personality. Journal of Personality, 61(4), 457–476. https://doi.org/10.1111/j.1467-6494.1993.tb00778.x

47.

Funk

M. J.

Westreich

Wiesen

Stürmer

Brookhart

M. A.

Davidian

(2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761–767. https://doi.org/10.1093/aje/kwq439

48.

Galatzer-Levy

I. R.

Bonanno

G. A.

Mancini

A. D.

(2010). From marianthal to latent growth mixture modeling: A return to the exploration of individual differences in response to unemployment. Journal of Neuroscience, Psychology, and Economics, 3(2), 116–125. https://doi.org/10.1037/a0020077

49.

Galatzer-Levy

I. R.

Mazursky

Mancini

A. D.

Bonanno

G. A.

(2011). What we don’t expect when expecting: Evidence for heterogeneity in subjective well-being in response to parenthood. Journal of Family Psychology, 25(3), 384–392. https://doi.org/10.1037/a0023759

50.

Golle

Rose

Göllner

Spengler

Stoll

Hübner

Rieger

Trautwein

Lüdtke

Roberts

B. W.

Nagengast

(2019). School or work? The choice may change your personality. Psychological Science, 30(1), 32–42. https://doi.org/10.1177/0956797618806298

51.

Gollob

H. F.

Reichardt

C. S.

(1987). Taking account of time lags in causal models. Child Development, 58(1), 80–92. https://doi.org/10.2307/1130293

52.

Greifer

(2020, August 1). Propensity score matching—What is the problem? [Online post]. Cross Validated. https://stats.stackexchange.com/questions/481110/propensity-score-matching-what-is-the-problem/481130#481130

53.

Greifer

(2023). WeightIt: Weighting for covariate balance in observational studies [Computer software]. https://ngreifer.github.io/WeightIt/

54.

Greifer

Stuart

E. A.

(2023). Choosing the causal estimand for propensity score analysis of observational studies (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2106.10577

55.

Grosz

M. P.

Ayaita

Arslan

R. C.

Buecker

Ebert

Hünermund

Müller

S. R.

Rieger

Zapko-Willmes

Rohrer

J. M.

(2024). Natural experiments: Missed opportunities for causal inference in psychology. Advances in Methods and Practices in Psychological Science, 7(1). https://doi.org/10.1177/25152459231218610

56.

Grosz

M. P.

Rohrer

J. M.

Thoemmes

(2020). The taboo against explicit causal inference in nonexperimental psychology. Perspectives on Psychological Science, 15(5), 1243–1255. https://doi.org/10.1177/1745691620921521

57.

Haehner

Kritzler

Luhmann

(2023). Can perceived and objective-descriptive event characteristics explain individual differences in changes in subjective well-being after negative life events? A specification curve analysis. PsyArXiv. https://doi.org/10.31234/osf.io/3mjr7

58.

Hainmueller

(2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20(1), 25–46. https://doi.org/10.1093/pan/mpr025

59.

Hamaker

E. L.

Muthén

B. O.

(2019). The fixed versus random effects debate and how it relates to centering in multilevel modeling. Psychological Methods, 25(3), 365–379. https://doi.org/10.1037/met0000239

60.

Haviland

Nagin

D. S.

Rosenbaum

P. R.

(2007). Combining propensity score matching and group-based trajectory analysis in an observational study. Psychological Methods, 12(3), 247–267. https://doi.org/10.1037/1082-989X.12.3.247

61.

Hentschel

Eid

Kutscher

(2017). The influence of major life events and personality traits on the stability of affective well-being. Journal of Happiness Studies, 18(3), 719–741. https://doi.org/10.1007/s10902-016-9744-y

62.

Hernán

M. A.

(2018). The C-word: Scientific euphemisms do not improve causal inference from observational data. American Journal of Public Health, 108(5), 616–619. https://doi.org/10.2105/AJPH.2018.304337

63.

Hernán

M. A.

Robins

J. M.

(2016). Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology, 183(8), 758–764. https://doi.org/10.1093/aje/kwv254

64.

Hernán

M. A.

Robins

J. M.

(2024). Causal inference: What if. Chapman & Hall/CRC.

65.

Hetschko

Knabe

Schöb

(2019). Looking back in anger? Retirement and unemployment scarring. Demography, 56(3), 1105–1129. https://doi.org/10.1007/s13524-019-00778-2

66.

Hetschko

Schmidtke

Eid

Lawes

Schöb

Stephan

(2022). The German job search panel. OSF. https://doi.org/10.31219/osf.io/7jazr

67.

Hill

P. L.

Beck

E. D.

Jackson

J. J.

(2021). Maintaining sense of purpose following health adversity in older adulthood: A propensity score matching examination. The Journals of Gerontology B: Psychological Sciences and Social Sciences, 76(8), 1574–1579. https://doi.org/10.1093/geronb/gbab002

68.

Hill

T. D.

Davis

A. P.

Roos

J. M.

French

M. T.

(2020). Limitations of fixed-effects models for panel data. Sociological Perspectives, 63(3), 357–369. https://doi.org/10.1177/0731121419863785

69.

D. E.

King

Stuart

E. A.

Imai

(2011). Matchit: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42(8), 1–28. https://doi.org/10.18637/jss.v042.i08

70.

Hoffman

(2019). On the interpretation of parameters in multivariate multilevel models across different combinations of model specification and estimation. Advances in Methods and Practices in Psychological Science, 2(3), 288–311. https://doi.org/10.1177/2515245919842770

71.

Holland

P. W.

(1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. https://doi.org/10.1080/01621459.1986.10478354

72.

Holland

P. W.

Rubin

D. B.

(1988). Causal inference in retrospective studies. Evaluation Review, 12(3), 203–231. https://doi.org/10.1002/j.2330-8516.1987.tb00211.x

73.

Hong

(2004). Causal inference for multi-level observational data with application to kindergarten retention [Unpublished dissertation, University of Michigan]. https://deepblue.lib.umich.edu/handle/2027.42/124428

74.

Hopwood

C. J.

Bleidorn

Wright

A. G. C.

(2022). Connecting theory to methods in longitudinal research. Perspectives on Psychological Science, 17(3), 884–894. https://doi.org/10.1177/17456916211008407

75.

Hosoya

Luhmann

Eid

(2020). Hierarchical linear models for discontinuous change. In Atkinson

Delamont

Cernat

Sakshaug

J. W.

Williams

(Eds.), SAGE Research Methods Foundations. Sage. https://doi.org/10.4135/9781526421036831956

76.

Hudde

Jacob

(2022). There’s more in the data! Using month-specific information to estimate changes before and after major life events. SocArXiv. https://doi.org/10.31235/osf.io/vueas

77.

Huling

J. D.

Mak

(2022). Energy balancing of covariate distributions. arXiv. http://arxiv.org/abs/2004.13962

78.

Hussong

A. M.

Curran

P. J.

Bauer

D. J.

(2013). Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology, 9(1), 61–89. https://doi.org/10.1146/annurev-clinpsy-050212-185522

79.

Imbens

G. W.

(2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics, 86(1), 4–29. https://doi.org/10.1162/003465304323023651

80.

Imbens

G. W.

(2024). Causal inference in the social sciences. Annual Review of Statistics and Its Application, 11(1), 123–152. https://doi.org/10.1146/annurev-statistics-033121-114601

81.

Imbens

G. W.

Rubin

D. B.

(2015). Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751

82.

Infurna

F. J.

Luthar

S. S.

(2017). Parents’ adjustment following the death of their child: Resilience is multidimensional and differs across outcomes examined. Journal of Research in Personality, 68, 38–53. https://doi.org/10.1016/j.jrp.2017.04.004

83.

Jackson

J. J.

Thoemmes

Jonkmann

Lüdtke

Trautwein

(2012). Military training and personality trait development: Does the military make the man, or does the man make the military? Psychological Science, 23(3), 270–277. https://doi.org/10.1177/0956797611423545

84.

Jagodzinski

Kühnel

S. M.

Schmidt

(1987). Is there a “Socratic effect” in nonexperimental panel studies? Consistency of an attitude toward guestworkers. Sociological Methods & Research, 15(3), 259–302. https://doi.org/10.1177/0049124187015003004

85.

Kang

J. D. Y.

Schafer

J. L.

(2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 569–573. https://doi.org/10.1214/07-STS227

86.

Keller

Wong

V. C.

Park

Zhang

Sheehan

Steiner

P. M.

(2024). A new four-arm within-study comparison: Design, implementation, and data. OSF. https://doi.org/10.31219/osf.io/2gur9

87.

Kim

Steiner

P. M.

(2021). Causal graphical views of fixed effects and random effects models. British Journal of Mathematical and Statistical Psychology, 74(2), 165–183. https://doi.org/10.1111/bmsp.12217

88.

Krämer

M. D.

Rodgers

J. L.

(2020). The impact of having children on domain-specific life satisfaction: A quasi-experimental longitudinal investigation using the Socio-Economic Panel (SOEP) data. Journal of Personality and Social Psychology, 119(6), 1497–1514. https://doi.org/10.1037/pspp0000279

89.

Krämer

M. D.

Rohrer

J. M.

Lucas

R. E.

Richter

(2024). Life events and life satisfaction: Estimating effects of multiple life events in combined models. European Journal of Personality. Advance online publication. https://doi.org/10.1177/08902070241231017

90.

Lawes

Hetschko

Schöb

Stephan

Eid

(2022). Unemployment and hair cortisol as a biomarker of chronic stress. Scientific Reports, 12, Article 21573. https://doi.org/10.1038/s41598-022-25775-1

91.

Lawes

Hetschko

Schöb

Stephan

Eid

(2023). The impact of unemployment on cognitive, affective, and eudaimonic well-being facets: Investigating immediate effects and short-term adaptation. Journal of Personality and Social Psychology, 124(3), 659–681. https://doi.org/10.1037/pspp0000417

92.

Lawes

Hetschko

Schöb

Stephan

Eid

(2024). Examining interindividual differences in unemployment-related changes in subjective well-being: The role of psychological well-being and re-employment expectations. European Journal of Personality. Advance online publication. https://doi.org/10.1177/08902070241231315

93.

Lehman

D. R.

Wortman

C. B.

Williams

A. F.

(1987). Long-term effects of losing a spouse or child in a motor vehicle crash. Journal of Personality and Social Psychology, 52(1), 218–231. https://doi.org/10.1037/0022-3514.52.1.218

94.

Letzring

T. D.

Spain

J. S.

(Eds.). (2021). The Oxford handbook of accurate personality judgment. Oxford University Press.

95.

Lipsitch

Tchetgen Tchetgen

Cohen

(2010). Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology, 21(3), 383–388. https://doi.org/10.1097/EDE.0b013e3181d61eeb

96.

Loh

W. W.

Ren

West

S. G.

(2024). Parametric g-formula for testing time-varying causal effects: What it is, why it matters, and how to implement it in lavaan. Multivariate Behavioral Research, 59(5), 995–1018. https://doi.org/10.1080/00273171.2024.2354228

97.

Luhmann

Buecker

Kaiser

Beermann

(2020). Nothing going on? Exploring the role of missed events in changes in subjective well-being and the big five personality traits. Journal of Personality, 89(1), 113–131. https://doi.org/10.1111/jopy.12539

98.

Luhmann

Eid

(2009). Does it really feel the same? Changes in life satisfaction following repeated life events. Journal of Personality and Social Psychology, 97(2), 363–381. https://doi.org/10.1037/a0015809

99.

Luhmann

Eid

(2013). Studying reaction to repeated life events with discontinous change models using HLM. In Garson

G. D.

(Ed.), Hierarchical linear modeling (pp. 273–298). Sage.

100.

Luhmann

Fassbender

Alcock

Haehner

(2021). A dimensional taxonomy of perceived characteristics of major life events. Journal of Personalitiy and Social Psychology, 121(3), 633–668. https://doi.org/10.1037/pspp0000291

101.

Luhmann

Hawkley

L. C.

Eid

Cacioppo

J. T.

(2012). Time frames and the distinction between affective and cognitive well-being. Journal of Research in Personality, 46(4), 431–441. https://doi.org/10.1016/j.jrp.2012.04.004

102.

Luhmann

Hofmann

Eid

Lucas

R. E.

(2012). Subjective well-being and adaptation to life events: A meta-analysis. Journal of Personality and Social Psychology, 102(3), 592–615. https://doi.org/10.1037/a0025948

103.

Luhmann

Lucas

R. E.

Eid

Diener

(2013). The prospective effect of life satisfaction on life events. Social Psychological and Personality Science, 4(1), 39–45. https://doi.org/10.1177/1948550612440105

104.

Luhmann

Orth

Specht

Kandler

Lucas

R. E.

(2014). Studying changes in life circumstances and personality: It’s about time. European Journal of Personality, 28(3), 256–266. https://doi.org/10.1002/per.1951

105.

Lundberg

Johnson

Stewart

B. M.

(2021). What is your estimand? Defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565. https://doi.org/10.1177/00031224211004187

106.

Lyubomirsky

Lepper

H. S.

(1999). A measure of subjective happiness: Preliminary reliability and construct validation. Social Indicators Research, 46(2), 137–155. https://doi.org/10.1023/A:1006824100041

107.

Mabe

P. A.

West

S. G.

(1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67(3), 280–296. https://doi.org/10.1037/0021-9010.67.3.280

108.

Mayer

Dietzfelbinger

Rosseel

Steyer

(2016). The EffectLiteR approach for analyzing average and conditional effects. Multivariate Behavioral Research, 51(2–3), 374–391. https://doi.org/10.1080/00273171.2016.1151334

109.

Mayer

Thoemmes

Rose

Steyer

West

S. G.

(2014). Theory and analysis of total, direct, and indirect causal effects. Multivariate Behavioral Research, 49(5), 425–442. https://doi.org/10.1080/00273171.2014.931797

110.

McArdle

J. J.

(2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60(1), 577–605. https://doi.org/10.1146/annurev.psych.60.110707.163612

111.

McNeish

(2023). A practical guide to selecting and blending approaches for clustered data: Clustered errors, multilevel models, and fixed-effect models. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000620

112.

McNeish

Kelley

(2019). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods, 24(1), 20–35. https://doi.org/10.1037/met0000182

113.

Mehl

M. R.

Eid

Wrzus

Harari

G. M.

Ebner-Priemer

U. W.

(Eds.). (2024). Mobile sensing in psychology: Methods and applications. The Guilford Press.

114.

Moreno-Betancur

(2021). The target trial: A powerful device beyond well-defined interventions. Epidemiology, 32(2), 291–294. https://doi.org/10.1097/EDE.0000000000001318

115.

Moser

S. E.

West

S. G.

Hughes

J. N.

(2012). Trajectories of math and reading achievement in low-achieving children in elementary school: Effects of early and later retention in grade. Journal of Educational Psychology, 104(3), 603–621. https://doi.org/10.1037/a0027571

116.

Moskowitz

D. S.

Sadikaj

(2012). Event-contingent recording. In Mehl

M. R.

Conner

T. S.

(Eds.), Handbook of research methods for studying daily life (pp. 160–175). The Guilford Press.

117.

Nesselroade

J. R.

(1991). The warp and the woof of the developmental fabric. In Downs

R. M.

Liben

L. S.

Palermo

D. S.

(Eds.), Visions of aesthetics, the environment & development: The legacy of Joachim F. Wohlwill (pp. 213–240). Lawrence Erlbaum Associates, Inc. https://doi.org/10.2307/3033464

118.

Pearl

(2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.

119.

Pohl

Steiner

P. M.

Eisermann

Soellner

Cook

T. D.

(2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463–479. https://doi.org/10.3102/0162373709343964

120.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.

121.

Raykov

(2012). Propensity score analysis with fallible covariates: A note on a latent variable modeling approach. Educational and Psychological Measurement, 72(5), 715–733. https://doi.org/10.1177/0013164412440999

122.

Reitz

A. K.

Luhmann

Bleidorn

Denissen

J. J. A.

(2022). Unraveling the complex relationship between work transitions and self-esteem and life satisfaction. Journal of Personalitiy and Social Psychology, 123(3), 597–620. https://doi.org/10.1037/pspp0000423

123.

Robins

(1986). A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9–12), 1393–1512. https://doi.org/10.1016/0270-0255(86)90088-6

124.

Rohrer

J. M.

(2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. https://doi.org/10.1177/2515245917745629

125.

Rohrer

J. M.

Hünermund

Arslan

R. C.

Elson

(2022). That’s a lot to process! Pitfalls of popular path models. Advances in Methods and Practices in Psychological Science, 5(2). https://doi.org/10.1177/25152459221095827

126.

Rohrer

J. M.

Murayama

(2023). These are not the effects you are looking for: Causality and the within-/between-persons distinction in longitudinal data analysis. Advances in Methods and Practices in Psychological Science, 6(1). https://doi.org/10.1177/25152459221140842

127.

Rosenbaum

P. R.

(1984). The consquences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society A: General, 147(5), 656–666. https://doi.org/10.2307/2981697

128.

Rosenbaum

P. R.

(1986). Dropping out of high school in the United States: An observational study. Journal of Educational Statistics, 11(3), 207–224. https://doi.org/10.2307/1165073

129.

Rosenbaum

P. R.

(2017). Observation and experiment: An introduction to causal inference. Harvard University Press. https://doi.org/10.4159/9780674982697

130.

Rosenbaum

P. R.

(2020). Design of observational studies (2nd ed.). Springer.

131.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41

132.

Rothwell

P. M.

(2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?” The Lancet, 365(9453), 82–93. https://doi.org/10.1016/S0140-6736(04)17670-8

133.

Rubin

D. B.

(1974). Estimating causal effects of treatment in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.

134.

Rubin

D. B.

(1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593. https://doi.org/10.2307/2287653

135.

Rubin

D. B.

(2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Statistics in Medicine, 26(1), 20–36. https://doi.org/10.1002/sim.2739

136.

Schafer

J. L.

Kang

(2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13(4), 279–313. https://doi.org/10.1037/a0014268

137.

Scharbert

Utesch

Reiter

Ter Horst

Van Zalk

Back

M. D.

Rau

(2024). If you were happy and you know it, clap your hands! Testing the peak-end rule for retrospective judgments of well-being in everyday life. European Journal of Personality. Advance online publication. https://doi.org/10.1177/08902070241235969

138.

Scherpenzeel

A. C.

Das

(2010). “True” longitudinal and probability-based internet panels: Evidence from the Netherlands. In Das

Ester

Kaczmirek

(Eds.), Social and behavioral research and the internet: Advances in applied methods and research strategies (pp. 77–104). Taylor & Francis.

139.

Seifert

I. S.

Rohrer

J. M.

Schmukle

S. C.

(2024). Using within-person change in three large panel studies to estimate personality age trajectories. Journal of Personality and Social Psychology, 126(1), 150–174. https://doi.org/10.1037/pspp0000482

140.

Sengewald

M.-A.

Mayer

(2024). Causal effect analysis in nonrandomized data with latent variables and categorical indicators: The implementation and benefits of EffectLiteR. Psychological Methods, 29(2), 287–307. https://doi.org/10.1037/met0000489

141.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company.

142.

Shook-Sa

B. E.

Hudgens

M. G.

(2022). Power and sample size for observational studies of point exposure effects. Biometrics, 78(1), 388–398. https://doi.org/10.1111/biom.13405

143.

Simonsohn

Simmons

J. P.

Nelson

L. D.

(2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208–1214. https://doi.org/10.1038/s41562-020-0912-z

144.

Singer

J. D.

Willett

J. B.

(2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press.

145.

Steyer

Ferring

Schmitt

M. J.

(1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.

146.

Steyer

Mayer

Geiser

Cole

D. A.

(2015). A theory of states and traits—Revised. Annual Review of Clinical Psychology, 11, 71–98. https://doi.org/10.1146/annurev-clinpsy-032813-153719

147.

Steyer

Schmitt

Eid

(1999). Latent state-trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389–408. https://doi.org/10.1002/(sici)1099-0984(199909/10)13:5<389::aid-per361>3.0.co;2-a

148.

Suk

H. W.

West

S. G.

Fine

K. L.

Grimm

K. J.

(2019). Nonlinear growth curve modeling using penalized spline models: A gentle introduction. Psychological Methods, 24(3), 269–290. https://doi.org/10.1037/met0000193

149.

Thoemmes

Ong

A. D.

(2016). A primer on inverse probability of treatment weighting and marginal structural models. Emerging Adulthood, 4(1), 40–59. https://doi.org/10.1177/2167696815621645

150.

Tillmann

Voorpostel

Antal

Dasoki

Klaas

Kuhn

Lebert

Monsch

G.-A.

Ryser

V.-A.

(2022). The Swiss Household Panel (SHP). Jahrbücher Für Nationalökonomie Und Statistik, 242(3), 403–420. https://doi.org/10.1515/jbnst-2021-0039

151.

Trutzenberg

Eid

. (2024). Stability and change of spirituality following childbirth: Longitudinal evidence from data of the Swiss Household Panel using multiple propensity score matching analyses. PsyArXiv. https://doi.org/10.31234/osf.io/6jdms

152.

University of Essex, Institute for Social and Economic Research. (2018). British Household Panel Survey: Waves 1-18, 1991-2009. [Data collection] (8th ed.). UK Data Service. https://doi.org/10.5255/UKDA-SN-5151-2

153.

van Scheppingen

M. A.

Leopold

. (2020). Trajectories of life satisfaction before, upon, and after divorce: Evidence from a new matching approach. Journal of Personality and Social Psychology, 119(6), 1444–1458. https://doi.org/10.1037/pspp0000270

154.

VanderWeele

T. J.

(2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press.

155.

Voelkle

M. C.

Gische

Driver

C. C.

Lindenberger

(2018). The role of time in the quest for understanding psychological mechanisms. Multivariate Behavioral Research, 53(6), 782–805. https://doi.org/10.1080/00273171.2018.1496813

156.

Wagner

G. G.

Frick

J. R.

Schupp

(2007). The German Socio-Economic Panel Study (SOEP) – Scope, evolution and enhancements. Schmollers Jahrbuch, 127, 139–169.

157.

Wang

Zubizarreta

J. R.

(2019). Minimal dispersion approximately balancing weights: Asymptotic properties and practical considerations. Biometrika, 107(1), 93–105. https://doi.org/10.1093/biomet/asz050

158.

Watson

Wooden

(2004). The HILDA Survey: A summary. Australian Journal of Labour Economics, 7(2), 117–124. https://doi.org/10.1111/j.1467-8462.2004.00336.x

159.

West

S. G.

Cham

Thoemmes

Renneberg

Schulze

Weiler

(2014). Propensity scores as a basis for equating groups: Basic principles and application in clinical treatment outcome research. Journal of Consulting and Clinical Psychology, 82(5), 906–919. https://doi.org/10.1037/a0036387

160.

West

S. G.

Duan

Pequegnat

Gaist

Des Jarlais

D. C.

Holtgrave

Szapocznik

Fishbein

Rapkin

Clatts

Mullen

P. D.

(2008). Alternatives to the randomized controlled trial. American Journal of Public Health, 98(8), 1359–1366. https://doi.org/10.2105/AJPH.2007.124446

161.

Westreich

Lessler

Funk

M. J.

(2010). Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology, 63(8), 826–833. https://doi.org/10.1016/j.jclinepi.2009.11.020

162.

Wing

Simon

Bello-Gomez

R. A.

(2018). Designing difference in difference studies: Best practices for public health policy research. Annual Review of Public Health, 39(1), 453–469. https://doi.org/10.1146/annurev-publhealth-040617-013507

163.

Winship

Radbill

(1994). Sampling weights and regression analysis. Sociological Methods & Research, 23(2), 230–257. https://doi.org/10.1177/0049124194023002004

164.

Wooldridge

J. M.

(2010). Econometric analysis of cross section and panel data (2nd ed.). MIT Press.

165.

Wooldridge

J. M.

(2013). Introductory econometrics: A modern approach (5th ed). South-Western Cengage Learning.

166.

Wysocki

A. C.

Lawson

K. M.

Rhemtulla

(2022). Statistical control requires causal justification. Advances in Methods and Practices in Psychological Science, 19. https://doi.org/10.1177/2515245922109582

167.

Yap

S. C. Y.

Anusic

Lucas

R. E.

(2012). Does personality moderate reaction and adaptation to major life events? Evidence from the British Household Panel Survey. Journal of Research in Personality, 46(5), 477–488. https://doi.org/10.1016/j.jrp.2012.05.005

168.

Zhou

Zou

Woods

S. A.

C. H.

(2019). The restorative effect of work after unemployment: An intraindividual analysis of subjective well-being recovery through reemployment. Journal of Applied Psychology, 104(9), 1195–1206. https://doi.org/10.1037/apl0000393

A Guide to Causal Inference in Life-Event Studies

Abstract

Keywords

Step 1: Defining the Causal Estimand

Defining the Causal Contrast

Defining the life event of interest

Defining the comparison condition

Defining the Outcome

Defining the Time Lag

Defining the Target Population

Step 2: Identifying the Causal Effect

Conditions That Need to Be Met

Using Directed Acyclic Graphs to Specify the Hypothesized Causal Structure

Step 3: Estimating the Causal Effect

Difference-in-Difference Designs

Creating covariate balance between the event and comparison groups

Matching

Weighting

Covariate adjustment in outcome regression

How to choose an appropriate balancing method

Analysis models

Modeling the trajectory of the event group

Modeling the trajectory of the comparison group and effect estimation

Limitations of difference-in-difference designs

Within-Person Designs

Analysis models

Multilevel models with person-mean centering

Fixed-effects models

Limitations of within-person models

Recommendations for Deciding Between Difference-in-Difference Models and Within-Person Models

Step 4: Probing Credibility, Robustness, and Generalizability of Effects

Plausibility of the Assumptions

Design elements and pattern matching

Sensitivity analysis

Robustness of the Analysis

Generalizability, Effect Heterogeneity, and Effect Moderation

Summary: Causal Inference in Life-Events Studies

Future Directions

Toward More Specialized Observational Studies

Natural Experiments

Conclusion

Footnotes

Acknowledgements

Transparency

ORCID iDs

Notes

References