Addressing Non-ignorable Panel Attrition Using External Population Data: Analysis of Demographic Events From Survey Data

Abstract

Empirical analysis of variation in demographic events within the population is facilitated by using longitudinal survey data because of the richness of covariate measures in such data, but there is wave-on-wave dropout. When attrition is related to the event, it precludes consistent estimation of the impacts of covariates on the event and on event probabilities in the absence of additional assumptions. The paper introduces an adjustment procedure based on Bayes Theorem that directly addresses the problem of nonignorable dropout. It uses population information external to the survey sample to convert estimates of event probabilities and marginal effects of covariates on them that are conditional on retention in the longitudinal data to unconditional estimates of these quantities. In many plausible and verifiable circumstances, it produces estimates of the marginal effect of covariates closer to the true unconditional quantities than the conditional estimates obtained from estimation using the survey data alone.

Keywords

event analysis sample selection bias panel attrition non-ignorable attrition demographic analysis combining population data

Introduction

Survey data are rich in covariates that are useful for analysis of variation in demographic events within the population. Interest in “events” means that prospective longitudinal data are particularly useful and increasingly available in many countries. In all individual longitudinal or household panel surveys, there is wave-on-wave dropout (and re-joiners). For the purposes of this paper, the extent of attrition is not the main issue, but whether it is “ignorable” for the consistent estimation of the impacts of covariates on the event, on event probabilities, and on statistics based on these probabilities, such as the proportion childless. It may be ignorable for some parameters but not others (e.g., for the slope parameter but not the intercept parameter in a regression context). Nonignorable dropout is a particular problem when the event itself is associated with dropout, even after conditioning on other variables. A prime example is residential mobility. As Washbrook, Clarke, and Steele (2014) point out, “The problem is particularly relevant to residential mobility because it is plausible to believe that the act of moving house has a direct, or even causal, effect on dropping out.” Similar considerations apply to events such as leaving the parental home, partnership formation, and dissolution, which often entail residential mobility.

Bareinboim, Tian, and Pearl (2014) define the concept of “recoverability” of probability distributions. Their non-parametric, graphical approach has distinct advantages over traditional frameworks for the analysis of missing data or sample selection (Mohan and Pearl 2021). They show that when sample selection depends on an outcome Y, the probability distribution of Y conditional on covariates X is not recoverable from analysis of the selected sample without making additional assumptions. The recoverability at issue here is the recovery of probabilistic (non-causal) parameters, not causal relationships (i.e., the covariates X are all assumed to be exogenous). In other terminology, the combined selection and outcome model is not identified (cannot be estimated consistently) without further assumptions, such as in Washbrook, Clarke, and Steele (2014).

The contribution of this paper is the introduction of an adjustment procedure that directly addresses the problem of nonignorable dropout. It uses population information external to the survey sample (e.g., census data or registration statistics) to convert estimates of event probabilities and marginal effects of covariates on them that are conditional on retention in the longitudinal data (“panel retention” for short) to unconditional estimates of these quantities and provides a measure of how close the unconditional mean probability is to its conditional estimate. It does not require the estimation of specific sample selection models, such as Heckman's (1979) oft-used model, which require unverifiable identification assumptions.

The paper applies the proposed adjustment method to two events for which there are data on mean age-specific event probabilities from the population. One is residential mobility, for which the population data by age and gender come from the 2011 United Kingdom (UK) Census, and the other is marriage for which the population data come from marriage registration statistics. In each case, the survey data is the large UK household panel survey called Understanding Society from which many covariates associated with the event can be measured.

In its use of external population data, the proposed method resembles a series of papers by Mark Handcock and Michael Rendall with a number of different additional co-authors (Handcock, Huovilainen, and Rendall 2000; Handcock, Rendall, and Cheadle 2005; Rendall, Handcock, and Jonsson 2009; Chaudhuri, Handcock, and Rendall 2008 ). These papers, based on ideas originally introduced by Imbens and Lancaster (1994), propose constrained maximum likelihood estimation of parameters in which non-linear constraints on moments of the unconditional distribution are obtained from population data. This method delivers large gains in efficiency (i.e., reduces standard errors substantially) when panel attrition is ignorable (observations are “missing at random”). However, when attrition is nonignorable, it does not produce consistent parameter estimates because they are conditioned on panel retention, and the way in which event probabilities computed from the estimated model vary with covariates depends on how panel retention varies with the covariates for the same reason that the unconstrained estimates conditioned on panel retention do. The constraints carry no information about the marginal effects of covariates that are not involved in calculating the marginal distribution from the population data.

The adjustment method proposed here does not ensure consistent estimates either, but it directly addresses the issue of nonignorable attrition and in doing so encourages researchers to consider explicitly the credibility of the assumption that missing observations are “missing at random.” It is transparent and computationally easier to implement than constrained maximation and provides a measure of how close the mean unconditional probability of the event is to its conditional estimate. Furthermore, the estimation of the impacts of covariates on panel retention provides information on whether the adjusted estimates of the marginal effects of covariates are closer to their true unconditional values than the corresponding conditional estimates obtained from estimation using the survey data alone. The estimated panel retention relationship also provides a way to reweight the conditional estimates in calculating mean probabilities and marginal effects.

The next section provides the statistical foundations for the proposed methods, including a non-parametric, graphical analysis of the “recovery” of the probability distribution not conditioned on panel retention from the conditional one and a discussion of the theoretical basis of the proposed adjustment procedure based on Bayes theorem. The section “Illustrative Structural Model” provides a brief theoretical analysis of a canonical parametric model composed of the main structural equation, the parameters of which are of central interest, and a panel retention equation in which retention is associated with the event of interest. It shows why the parameter estimates from the selected sample are biased when attrition is nonignorable and the direction of the bias. The section “Artificial Data” is an analysis of simulations of estimation of the canonical model, which vary the parameters of the retention equation to determine how they affect the adjusted “unconditional” estimates relative to the estimates that condition on panel retention. It derives conditions in terms of these panel retention parameters under which the adjusted estimates of marginal effects of covariates are superior to the conditional ones in terms of distance to the true unconditional quantities. The section “Examples” provides two examples, one on residential mobility, which uses population census data, and one on marriage, which uses marriage registration statistics. This is followed by a conclusions section.

Foundations

Unconditional and Conditional Event Probabilities

Define for person i with a vector of covariates $X_{i}$ the probability $P [M_{i} = j | X_{i}]$ , where $M_{i} = 1$ indicates an event, such as a residential move, between waves of the panel, and $M_{i} = 0$ if it does not occur. Also define the probability $P [R_{i} = j | X_{i}],$ where $R_{i} = 1$ if the person remains in the panel between consecutive waves (“retention” for short) and $R_{i} = 0$ if they do not. Because we do not know the value of $M_{i}$ if they drop out of the panel, studies have identified the distribution $P [M_{i} = 1 | R_{i} = 1, X_{i}]$ ; that is, how variables $X_{i}$ affect the probability that $M_{i} = 1$ conditional on remaining in the sample. We would have liked to estimate how the variables affected the unconditional probability of the event, $P [M_{i} = 1 | X_{i}]$ , or in the terminology of Bareinboim, Tian, and Pearl (2014), $P [M_{i} = 1 | X_{i}]$ is the distribution we would like to “recover” from the analysis of the selected sample.

This is a particular case of the generic problem of “missingness” addressed by Mohan and Pearl (2021). Following their approach, Figure 1 illustrates the problem of panel retention in analyzing demographic events using a directed acyclic causal graph when $X_{i}$ contains two variables: $X_{i} = [Z_{i} A_{i}] .$ When a person drops out of the panel, a value for $M_{i}$ is missing and we obtain a missing measurement designated by m. The panel retention process can be modeled using an observed proxy variable $M_{i}^{P}$ the values for which are determined by $M_{i}$ and the “missingness mechanism,” $R_{M}$ , which takes the following form:

M_{i}^{P} = f (R_{M}, M_{i}) = {\begin{matrix} M_{i} i f R_{M} = 1 \\ m i f R_{M} = 0 \end{matrix}

Figure 1.

Causal graph with selection on event variable M.

When the person is not retained in the panel, whether an event took place or not is concealed and $R_{M} = 0$ . When we observe the occurrence or absence of the event, $R_{M} = 1$ .

The graph depicted in Figure 1 shows, in the conventional statistical terminology (Little and Rubin 2014), a “not missing at random (NMRA)” process because of the edges linking $M_{i}$ , $R_{M}$ , and $M_{i}^{P} .$ In words, the process generating missing values on the demographic event variable depends on the event itself. Removing the edge between $M_{i}$ and $R_{M}$ would make the process “missing at random (MAR),” which is the assumption made in almost all the statistical literature on multivariate incomplete data. Inclusion of the edge means that $R_{M}$ is not independent of $M_{i}$ given $X_{i} = [Z_{i} A_{i}]$ , and by Theorem 1 of Bareinboim, Tian, and Pearl (2014:2412) this means that “there exists no procedure that would be capable of recovering the distribution from selection bias (without adding assumptions).” Theorem 3 in Mohan and Pearl (2021) expresses the ideas simply. It states that the distribution $P [M_{i} | Z_{i}, A_{i}]$ is not recoverable from the observed data because M and $R_{M}$ are neighbors.

Let $A_{i}$ be a variable measured in both the survey data and in external population data, while $Z_{i}$ is only measured in the longitudinal survey data. In the examples later, $A_{i}$ is a person’s age. As explained below, the proposed adjustment procedure aims to approximate the recovery of moments of $P [M_{i} | Z_{i}, A_{i}]$ by using external population data on the marginal distribution $P [M_{i} | A_{i}]$ .¹

Application of Bayes Theorem

Returning to the general formulation, by Bayes Theorem, we can express the conditional probability as

\begin{aligned} P [M_{i} = 1 | R_{i} = 1, X_{i}] = \frac{P [R_{i} = 1 | M_{i} = 1, X_{i}] \cdot P [M_{i} = 1 | X_{i}]}{P [R_{i} = 1 | X_{i}]} \\ if P [R_{i} = 1 | X_{i}] \neq 0 \end{aligned}

(1)

Re-arranging terms, we obtain

P [M_{i} = 1 | X_{i}] = \frac{P [M_{i} = 1 | R_{i} = 1, X_{i}] \cdot P [R_{i} = 1 | X_{i}]}{P [R_{i} = 1 | M_{i} = 1, X_{i}]}

(2)

Define

q_{i} = \frac{P [R_{i} = 1 | M_{i} = 0, X_{i}]}{P [R_{i} = 1 | M_{i} = 1, X_{i}]}

, and note that

P [R_{i} = 1 | X_{i}] = P [R_{i} = 1 | M_{i} = 1, X_{i}] \cdot P [M_{i} = 1 | X_{i}]

+ P [R_{i} = 1 | M_{i} = 0, X_{i}] \cdot (1 - P [M_{i} = 1 | X_{i}]) .

From these definitions, equation (2) entails the key equation for the analysis, which follows:

P [M_{i} = 1 | X_{i}] = \frac{P [M_{i} = 1 | R_{i} = 1, X_{i}] q_{i}}{1 - P [M_{i} = 1 | R_{i} = 1, X_{i}] (1 - q_{i})}

(3)

If, conditional on

X_{i}

, remaining in the panel to the subsequent wave is independent of the event, then

q_{i} = 1

and the conditional and unconditional probabilities of the event coincide. It is, however, plausible that if the event is, for example, residential mobility,

q_{i}

> 1 because it is more difficult to follow people in the panel when they move.²

If $q_{i} > 1 (q_{i} < 1),$ the denominator of equation (3) is smaller (larger) than $q_{i}$ .³ This implies that the unconditional probability of the event given $X_{i}$ is a multiple (fraction) of the conditional probability given $X_{i}$ . If we were able to obtain an estimate of $q_{i}$ from external data, then the right-hand side of (3) involves only observable variables.

Figure 2 plots a hypothetical age profile of true event probabilities $(P [M_{i} = 1 | a g e])$ , along with the probabilities conditional on retention $(P [M_{i} = 1 | R_{i} = 1, a g e])$ , which is calculated using equation (3) under the assumption that $q_{i} = 2$ for everyone and denoted as “observed.” It clearly deviates substantially from $P [M_{i} = 1 | a g e]$ .

Figure 2.

Event (M = 1) probabilities by age: Hypothetical true and observed (conditional on panel retention), q = 2.

The Adjustment Procedure

The adjustment of conditional probabilities to obtain unconditional ones amounts to choosing values for $q_{i}$ estimated from external population information. The adjustment procedure proposed obtains an estimate of a “weighted average” of the $q_{i}, denoted$ $\hat{q}$ , either globally or for discrete groups (e.g., gender or time period). Let $X_{i} = [Z_{i} A_{i}], where$ $A_{i}$ is a scalar measured in both the survey and the external population data and the marginal distribution $P [M_{i} | A_{i}]$ is observed in the external data. The estimate of $\hat{q}$ is obtained by substituting $P [M_{i} | A_{i}]$ for $P [M_{i} | X_{i}]$ and $P [M_{i} | R_{i} = 1, A_{i}]$ for $P [M_{i} | R_{i} = 1, X_{i}]$ in equation (3) and estimating $\hat{q}$ using non-linear least squares over discrete categories of $A_{i} .$ ⁴ $\hat{q}$ is a measure of how close the unconditional probability is to its conditional estimate. The estimate of the unconditional probability of the event is then obtained by substituting $\hat{q}$ in the original equation (3) and it is denoted as ${\hat{P}}_{M} [Z_{i}, A_{i}, \hat{q}] (= {\hat{P}}_{M i}$ for short).

Marginal Effects of Covariates on Unconditional Probabilities

This section considers the adjustment of marginal effects in a parametric model with continuous covariates.⁵ The unconditional marginal effect for element j of $X_{i}$ is given by $\frac{d P [M = 1 | X_{i}]}{d X_{i j}} = U M E_{i j}$ and the conditional one as $\frac{d P [M = 1 | R = 1, X_{i}]}{d X_{i j}} = C M E_{i j}$ . If we assume $q_{i} = \hat{q}$ , then from equation (3), we obtain an estimate of $U M E_{i j}$ :

\hat{U M E_{i j}} = \frac{\hat{q} {\hat{C M E}}_{i j}}{{[1 - \hat{P} [M_{i} = 1 | R_{i} = 1, X_{i}] (1 - \hat{q})]}^{2}}

(4)

The ratio

\frac{\hat{U M E_{i j}}}{\hat{C M E_{i j}}}

exceeds (is less than) unity for

\hat{q} > 1 (\hat{q} < 1)

. In a parametric model of an event with normally distributed errors, the marginal effects are the product of the relevant probit slope parameter estimate and the Normal density function evaluated using all the parameter estimates (including the intercept) and an individual's

X_{i}

values. The mean

\hat{U M E_{i j}}

in the selected data,

E [{\hat{U M E}}_{i j} | R_{i} = 1],

differs from the true mean unconditional marginal effect,

E [U M E_{i j}]

, for three reasons: (1) because of errors in estimating

C M E_{i j}

; (2) because of using

\hat{q}

for everyone and (3) because of the difference in the distribution of

X_{i}

in the selected data from that in the entire population. The last reason is addressed, at least approximately, by reweighting the data, as explained in the next section and illustrated with artificial data in the section “Artificial Data.”

Illustrative Structural Model

Consider the following model, in which panel retention $(R_{i} = 1)$ is influenced by (a) the event occurring ( $M_{i} = 1$ ) and/or by (b) correlation between omitted factors influencing the event and panel retention ( $u_{i}$ and $ε_{i}$ respectively):

M_{i}^{*} = β_{0} + β X_{i} + u_{i}

(5a)

R_{i}^{*} = α_{0} + α M_{i} + γ X_{i} + ε_{i}

(5b)

where

u_{i}

and

ε_{i}

are distributed as joint standard Normal and

E [X_{i} u_{i}] = 0 = E [X_{i} ε_{i}] .

In the data, we only observe cases for which

R_{i} = 1, which occurs when R_{i}^{*} > 0

. The event occurs (

M_{i} = 1

) when

M_{i}^{*} > 0.

The biases that arise because of conditioning on retention can be most easily illustrated when

α = 0

Form the expectation for the latent index $M_{i}^{*}$ associated with the event conditional on panel retention:

E (M_{i}^{*} | X_{i}, R_{i} = 1) = β_{0} + β X_{i} + E (u_{i} | α_{0} + γ X_{i} + ε_{i} > 0) = β_{0} + β X_{i} + E (u_{i} | ε_{i} > - α_{0} - γ X_{i})

(6)

Φ (α_{0} + γ X_{i})

is the standard Normal distribution function and

ϕ (α_{0} + γ X_{i})

is the standard Normal density function it follows from the statistics of truncated Normal distributions that

E (M_{i}^{*} | X_{i}, R_{i} = 1) = β_{0} + β X_{i} + ρ [\frac{ϕ (α_{0} + γ X_{i})}{Φ (α_{0} + γ X_{i})}]

(7)

where

ρ

is the correlation coefficient between u and

ε

.⁶ The ratio in brackets is often called the inverse Mills ratio or “Heckman's lambda” (Heckman 1979). It is a monotone decreasing function of the probability that an observation is selected into the sample,

Φ (α_{0} + γ X_{i}) .

It follows that if

ρ

is non-zero, then

E (M_{i}^{*} | X_{i}, R_{i} = 1) \neq β_{0} + β X_{i} .

If, for example,

ρ < 0

, then

E (M_{i}^{*} | X_{i}, R_{i} = 1) < β_{0} + β X_{i}

; that is, the event is less likely in the selected sample than in the general population. In terms of

q_{i}, q_{i} > 1

when

ρ < 0

, and

q_{i} < 1

when

ρ > 0

. Thus, the parameter estimates are a function of

γ

when

ρ

is non-zero.

The estimated impact of the jth element of $X_{i}$ , $X_{i j},$ on the latent index $M_{i}^{*}$ is also biased when $ρ \neq 0$ and $γ_{j} \neq 0.$ It confuses the impact of $X_{i j}$ on $M_{i}^{*}$ , which affects the probability of the event, with its impact on panel retention. To see this, consider the impact of $X_{i j}$ on the conditional expectation of $M_{i}^{*} .$ From equation (7), if $γ_{j} > 0 and ρ < 0$ , then in the limit the estimate of $β_{j}$ converges to the mean of $\frac{d (M_{i}^{*} | X_{i}, R_{i} = 1)}{d X_{i j}} > β_{j} .$ The inequality arises because higher $X_{i j}$ increases the probability of retention and lowers the inverse Mills ratio, and the opposite inequality obtains if $γ_{j} < 0 and ρ < 0$ . In other words, the estimate of $β_{j}$ from a probit model from the selected sample will tend to be biased in the direction of the sign of $γ_{j} when ρ < 0$ . Similarly, if $γ_{j} < 0 and ρ > 0$ , then the estimate of $β_{j}$ converges to the mean of $\frac{d (M_{i}^{*} | X_{i}, R_{i} = 1)}{d X_{i j}} > β_{j}$ and the opposite inequality obtains when $γ_{j} > 0 and ρ > 0$ . Thus, the estimate of $β_{j}$ from the selected sample will tend to be biased in the opposite direction to the sign of $γ_{j} when ρ > 0.$

The estimate of $β_{j}$ is also correlated with the estimate of $β_{0}$ , which is biased downward (upward) when $ρ < 0 (> 0) .$ Because of this correlation, the estimate of $β_{j}$ can be biased even when $γ_{j} = 0.$

The marginal effect of $X_{i j}$ on the probability of the event is $β_{j} ϕ (β_{0} + β X_{i})$ . The estimated average marginal effect from the selected sample will differ from its true value because of bias in estimating $β$ and $β_{0} and$ because of the difference in the distribution of $X_{i}$ in the selected data from that in the entire population, which depends on $γ$ . The last issue can be addressed by re-weighting the data in the computation of the mean value of $β_{j} ϕ (β_{0} + β X_{i})$ . In the illustrative model here with $α = 0$ , we can obtain consistent estimates of $α_{0}$ and $γ$ by estimating a probit model for $P [R_{i} = 1 | X_{i}]$ , and from these parameter estimates, we obtain an estimate of the “propensity score” for panel retention, $\hat{P} [R_{i} = 1 | X_{i}]$ . The weights for the computation of the average marginal effects are then 1/ $\hat{P} [R_{i} = 1 | X_{i}]$ .

Artificial Data

Discussion of Figure 1 indicated that there is no procedure that can produce consistent estimates of the parameters of equation (5a) when $α \neq 0 or ρ \neq 0$ , because in either of these cases $M_{i}$ and $R_{M}$ are neighbors. That is, we cannot recover $P [M_{i} = 1 | X_{i}]$ from $P [M_{i} = 1 | X_{i}, R_{i} = 1]$ . Simulations of parameter estimation, reweighting, and application of the adjustment procedure using $\hat{q}$ can help us understand better how well the adjustment works when there are “departures from recoverability.”

The simulations use the following variant of the structural model in the section “Illustrative Structural Model” in which $X_{i} = [Z_{i} A_{i}]$ , with both $Z_{i} and A_{i}$ being scalars:

M_{i} * = β_{0} + β_{a} A_{i} + β_{z} Z_{i} + u_{i}

(5a\prime)

R^{*} = α_{0} + α M_{i} + γ_{a} A_{i} + γ_{z} Z_{i} + ε_{i}, α \leq 0

(5b\prime)

The variable $A_{i}$ is assumed to be available in the external population data from which we obtain estimates of the marginal distribution $P [M_{i} = 1 | A_{i}]$ for the entire population. The focus of the discussion is on the marginal effect of $Z_{i}$ on $P [M_{i} = 1 | A_{i}, Z_{i}]$ because the most important reason for using the survey data is the availability of measures of key variables which are not in the external data for which we would like to estimate their unconditional marginal effects on the event. The marginal effect of $A_{i}$ conditional on $Z_{i}$ may also differ from its association with $M_{i}$ in the marginal distribution. In the simulations which follow the true values of the parameters in equation $(5 a^{'})$ are taken to be $β_{a} = 1, β_{z} = 0.5, β_{0} = - 1.28$ and $α_{0} = 0 .5$ in all analyses.

“Small Departures” from Recoverability

First consider the case in which $ρ = α = 0$ , which amounts to removing the edge between $M_{i}$ and $R_{M}$ in Figure 1. We know that $P [M_{i} | Z_{i}, A_{i}]$ is recoverable in this case. In particular, we can recover the mean $P [M_{i} = 1 | Z_{i}, A_{i}]$ and the average marginal effect of $Z_{i}$ on $P [M_{i} | Z_{i}, A_{i}]$ by reweighting the conditional sample (i.e., conditional on $R_{i} = 1$ ) using the inverse of $\hat{P} [R_{i} | Z_{i}, A_{i}]$ as the weight in the mean calculations (i.e., inverse propensity score weighting).

To illustrate the impact of departures from $ρ = α = 0$ consider simulations of parameter estimation in variants of the model in $(5 a^{'}) and (5 b^{'})$ with continuous $A_{i}$ and $Z_{i}$ which are drawn from a joint standard Normal distribution with correlation coefficient $r_{a z}$ . The exposition creates large illustrative data sets of 100,000 observations under different parameter assumptions to see how the estimated mean probability of an event and of the mean effect of a covariate on it from a sample conditioned on retention differs from their true values and how well adjustment remedies the difference.⁷ The variants of the model are illustrated for different impacts of $A_{i}$ and $Z_{i}$ on the latent variable for panel retention $(γ_{a} and γ_{z})$ . Let mean[.] indicate the sample mean.

To illustrate how well reweighting and adjustment using an estimate of $\hat{q}$ works in this context, first assume that $α = - 0.1, ρ = 0 and r_{a z} = 0.3$ . When $α \neq 0$ we must work with a mis-specified function for the probability of panel retention which omits $M_{i}$ , and contains only $Z_{i} and A_{i}$ , and so the re-weighting can only be approximate. Comparison of the two columns for each of the two-parameter configurations in panel A of Table 1 ( $γ_{a} = γ_{z} = 0.3$ and $γ_{a} = γ_{z} = 0$ ) indicates that reweighting alone makes the mean estimated conditional probability of the event and the average conditional marginal effect of $Z_{i}$ much closer to their unconditional true counterparts (0.212 and 0.091, respectively), because reweighting takes account of the different distribution of $Z_{i} and A_{i}$ in the sample which conditions on panel retention. As we would expect, the reweighting is much more important when $γ_{a} = γ_{z} = 0.3.$

Table 1.

Simulations of Model in Equations (5a′) and (5b′), $β_{a} = 1, β_{z} = 0.5, β_{0} = - 1.28$ . True Values: $P [M_{i} = 1] = 0.212$ , $E [U M E_{i} (Z)] = 0.09$ 1.

A. $α = - 0.1, ρ = 0$ ; $C o r r (A, Z) = 0.3$
	$γ_{a} = 0.3; γ_{z} = 0.3$		$γ_{a} = 0; γ_{z} = 0$
	Weighted	Unweighted	Weighted	Unweighted
${\hat{β}}_{a}$	1.00	1.00	0.99	0.99
${\hat{β}}_{z}$	0.50	0.50	0.49	0.49
$\hat{β_{0}}$	−1.30	−1.30	−1.31	−1.31
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.208	0.254	0.206	0.203
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.211	0.258	0.210	0.208
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.090	0.103	0.089	0.088
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.091	0.104	0.090	0.090
$\hat{q}$	1.03	1.03	1.05	1.05
B. $α = 0, ρ = - 0.1$ ; $C o r r (A, Z) = 0.3$
	$γ_{a} = 0.3; γ_{z} = 0.3$		$γ_{a} = 0$ .3, $γ_{z} = 0$
	Weighted	Unweighted	Weighted	Unweighted
${\hat{β}}_{a}$	1.01	1.01	1.02	1.02
${\hat{β}}_{z}$	0.51	0.51	0.50	0.50
$\hat{β_{0}}$	−1.34	−1.34	−1.34	−1.34
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.205	0.252	0.204	0.234
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.210	0.258	0.210	0.240
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.090	0.104	0.088	0.096
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.091	0.105	0.090	0.098
$\hat{q}$	1.05	1.05	1.06	1.06

Assume that population data on the unconditional marginal probability $P [M_{i} = 1 | A_{i}]$ is available for deciles of $A_{i}$ , labeled $A_{i}^{D} .$ For example, with the parameter configuration in the left-hand side of panel A, $q_{i}$ decreases with $A_{i}$ . In the lowest decile, $E [q_{i} | A_{i}^{D} = 1] = 1.10$ and in the highest $E [q_{i} | A_{i}^{D} = 10] = 1.03.$ We estimate $\hat{q}$ by relating $P [M_{i} = 1 | A_{i}^{D}]$ to $\hat{P} [M_{i} = 1 | R_{i} = 1, A_{i}^{D}]$ in equation (3) using non-linear least squares over deciles of $X_{i}$ . Because the difference between these two quantities is larger for the higher deciles, these attract more weight in the non-linear least squares’ estimator, yielding an estimate of $\hat{q}$ of 1.03 (robust SE = 0.001). Using $\hat{q}$ to adjust the reweighted estimates of the mean probability of the event and of the average marginal effect of $Z_{i}$ moves the estimates closer to their true values.

When $γ_{a} = γ_{z} = 0$ (right-hand side of panel A) the estimates of the impacts of $A_{i}$ and $Z_{i}$ in the mis-specified propensity score equation (not shown) are spuriously negative (rather than 0), reflecting that $α < 0$ in the true retention equation.⁸ Yet the re-weighting and adjustment using $\hat{q}$ works relatively well in estimating the mean probability of the event and the average marginal effect of $Z_{i}$ , although not as close as in parameter configuration on the left-hand side of panel A.

Next consider the case in which $ρ = - 0.1 and α = 0$ which also represents a small departure from recoverability. In this case, the estimates of parameters of the propensity score equation are consistent. Similar conclusions concerning reweighting and adjustment using $\hat{q}$ apply (see panel B of Table 1). Thus, small departures from recoverability can be addressed well by re-weighting and the proposed adjustment procedure. The next section considers how well reweighting and adjustment using $\hat{q}$ works for large departures from recoverability.

“Large Departures” From Recoverability

As in the previous section, $A_{i}$ and $Z_{i}$ are continuous and are drawn from a joint standard Normal distribution with correlation coefficient $r_{a z} .$ We consider simulations of parameter estimation in two variants of the model in $(5 a^{'}) and (5 b^{'})$ , which represent relatively strong mechanisms of selection related to event outcomes. In one model the event directly affects retention (Model 1): $α = - 1$ , $ρ = 0$ . In the other, the association between retention and the event is assumed to be due solely to omitted variables that are associated with both processes (Model 2): $α = 0$ , $ρ = - 0.5$ .

First consider how parameter estimation on the conditional sample is affected by strong departures from recoverability in these two models. Monte Carlo simulations of the estimate of the probit parameters ( $β_{0}$ , $β_{a}$ , and $β_{z}$ ) using samples of 8,000 cases are shown in Table 2. In these, the impact of $A_{i}$ on retention ( $γ_{a}$ ) is set to zero, $r_{a z} = 0.3$ and $γ_{z}$ is varied between −0.3 and 0.3. What is most striking is that the 95% confidence interval of the estimate of the intercept $β_{0}$ never contains its true value (−1.28), creating substantial downward bias in an estimate of the mean probability of the event from the selected sample. Also, the 95% confidence interval for the slope parameters $β_{a}$ and $β_{z}$ generally includes its true value much less than 95% of the time, and in some parameter configurations almost never includes its true value. Not surprisingly the estimate of $β_{z} performs best when γ_{z} = 0.$ Estimates of these parameters may be considered of less intrinsic interest on their own than estimates of the average marginal effects of $Z_{i}$ and $A_{i} .$ Even after reweighting using the inverse propensity score, these are affected by bias in estimating both the intercept and slope parameters of the probit model. The focus now shifts to these quantities.

Table 2.

Monte Carlo Simulations of Estimate of $β$ (Sample Size = 8,000, 2,000 Replications). True Values: $β_{a} = 1, β_{z} = 0.5 β_{0} = - 1.28, α_{0} = 0 .5, γ_{a} = 0, r_{a z} = 0.3$ .

Model 1 $α = - 1$ , $E [ε u] = 0$ :
	$m e a n$ $\hat{β}$	S.D. $\hat{β}$	$m e a n l o w e r$ $95 % C I$	$m e a n u p p e r$ $95 % C I$	Coverage^a
$γ_{z} = - 0.3$
${\hat{β}}_{a}$	0.94	0.04	0.85	1.03	0.69
${\hat{β}}_{z}$	0.37	0.04	0.30	0.44	0.05
$\hat{β_{0}}$	−1.69	0.04	−1.77	1.61	0.00
$γ_{z} = 0$
${\hat{β}}_{a}$	0.95	0.04	0.86	1.03	0.73
${\hat{β}}_{z}$	0.47	0.04	0.41	0.54	0.88
$\hat{β_{0}}$	−1.69	0.04	−1.77	−1.60	0.00
$γ_{z} = 0.3$
${\hat{β}}_{a}$	0.95	0.04	0.88	1.03	0.78
${\hat{β}}_{z}$	0.57	0.03	0.51	0.64	0.42
$\hat{β_{0}}$	−1.69	0.04	−1.77	−1.61	0.00

Model 2 $α = 0$ , $E [ε u] = - 0.5$ :
	$m e a n$ $\hat{β}$	S.D. $\hat{β}$	$m e a n l o w e r$ $95 % C I$	$m e a n u p p e r$ $95 % C I$	Coverage^a
$γ_{z} = - 0.3$
${\hat{β}}_{a}$	1.09	0.04	1.01	1.17	0.39
${\hat{β}}_{z}$	0.45	0.03	0.39	0.51	0.61
$\hat{β_{0}}$	−1.66	0.04	−1.74	−1.58	0.00
$γ_{z} = 0$
${\hat{β}}_{a}$	1.08	0.04	1.01	1.16	0.42
${\hat{β}}_{z}$	0.54	0.03	0.48	0.60	0.75
$\hat{β_{0}}$	−1.66	0.04	−1.73	−1.58	0.00
$γ_{z} = 0.3$
${\hat{β}}_{a}$	1.07	0.04	0.92	1.07	0.94
${\hat{β}}_{z}$	0.62	0.03	0.57	0.69	0.01
$\hat{β_{0}}$	−1.65	0.04	−1.74	−1.58	0.00

^aCoverage is the proportion of 95% CIs that contain the true value.

It is informative to see how the conditional probability of the event and conditional marginal effect of $Z_{i}$ and vary with $γ_{z} .$ As in Table 1, the analysis creates large illustrative data sets of 100,000 observations each under different parameter assumptions. Figures 3A and 3B illustrate how $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1])$ and $m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]$ , respectively, vary with $γ_{z}$ when $γ_{a} = 0 and r_{a z} = 0.3$ . There is an approximately linear relationship between $γ_{z}$ and these conditional quantities which is the same in both models but with higher mean values for Model 2. After reweighting using the inverse propensity score, the linear relationship between $γ_{z}$ and the conditional quantities is less steep. In Figure 3A, the estimator of the reweighted $m e a n {\hat{[C M E}}_{i} (Z) | R_{i} = 1]$ approaches the true value as $γ_{z}$ increases because of the increase in the estimate of $β_{z}$ , which becomes increasingly upward biased beyond $γ_{z} = 0$ . Because in these two models $\hat{q} > 1$ , the adjustment procedure entails that $m e a n [{\hat{U M E}}_{i} | R_{i} = 1] >$ $m e a n {\hat{[C M E}}_{i} (Z) | R_{i} = 1]$ and so there must be a value of $γ_{z}$ for which adjustment using $\hat{q}$ takes us farther from the true value than using the conditional estimate. For instance, in model 2, for $γ_{z} > 0.3, the reweighted$ $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1])$ overstates the true marginal effect and adjusting it further upward would only make the estimate worse. In Figure 3B, the reweighted $m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]$ remains well below the true value of the unconditional mean.

Figure 3.

(a) Estimated conditional marginal effect of Z and true unconditional effect. (b) Estimated mean conditional and true unconditional probability of the event.

In panels A through C of Table 3, we illustrate nine parameter configurations for each model in which $γ_{a} and γ_{z}$ each takes on three possible values: −0.3, 0 and 0.3 and the covariance of A and Z is set to 0.3. In panel D, this covariance is set to 0. Initially, we focus on comparing panels A through C. Amongst these, the adjustment factor $\hat{q}$ is larger in model 1 than model 2; $\hat{q}$ declines with higher values of both $γ_{a} and γ_{z}$ , particularly in Model 1, and the decline in $\hat{q}$ is steeper with $γ_{a} than with γ_{z}$ .

Table 3.

Simulations of Model in Equations (5a′) and (5b′), $β_{a} = 1, β_{z} = 0.5, β_{0} = - 1.28$ . True Values: $P [M_{i} = 1] = 0.212$ , $E [U M E_{i} (Z)] = 0.09$ 1.

A. $γ_{a} = 0.3$
	$γ_{z} = - 0.3$		$γ_{z} = 0$		$γ_{z} = 0$ . 3
$C o r r (A, Z) = 0.3$	Model 1^a	Model 2^b	Model 1^a	Model 2^b	Model 1^a	Model 2^b
${\hat{β}}_{a}$	1.02	1.15	1.03	1.14	1.03	1.13
${\hat{β}}_{z}$	0.38	0.45	0.47	0.53	0.55	0.60
$\hat{β_{0}}$	−1.68	−1.65	−1.67	−1.64	−1.67	−1.64
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.133	0.164	0.143	0.171	0.150	0.175
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.191	0.200	0.192	0.201	0.195	0.202
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.054	0.066	0.068	0.079	0.080	0.089
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.070	0.075	0.084	0.088	0.096	0.098
$\hat{q}$	1.92	1.49	1.75	1.41	1.66	1.35

B. $γ_{a} = 0$
	$γ_{z} = - 0.3$		$γ_{z} = 0$		$γ_{z} = 0$ . 3
$C o r r (A, Z) = 0.3$	Model 1^a	Model 2^b	Model 1^a	Model 2^b	Model 1^a	Model 2^b
${\hat{β}}_{a}$	0.91	1.08	0.93	1.08	0.93	1.07
${\hat{β}}_{z}$	0.36	0.44	0.47	0.54	0.56	0.62
$\hat{β_{0}}$	−1.67	−1.65	−1.68	−1.65	−1.68	−1.65
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.118	0.156	0.129	0.162	0.137	0.168
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.192	0.200	0.191	0.201	0.192	0.202
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.050	0.065	0.067	0.079	0.080	0.091
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.072	0.078	0.089	0.091	0.102	0.103
$\hat{q}$	2.25	1.65	2.01	1.54	1.86	1.47

C. $γ_{a} = - 0.3$
	$γ_{z} = - 0.3$		$γ_{z} = 0$		$γ_{z} = 0$ . 3
$C o r r (A, Z) = 0$	Model 1^a	Model 2^b	Model 1^a	Model 2^b	Model 1^a	Model 2^b
${\hat{β}}_{a}$	0.78	0.98	0.80	0.98	0.83	0.99
${\hat{β}}_{z}$	0.35	0.44	0.46	0.54	0.56	0.63
$\hat{β_{0}}$	−1.68	−1.65	−1.68	−1.66	−1.68	−1.67
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.103	0.144	0.112	0.151	0.123	0.158
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.195	0.203	0.193	0.201	0.191	0.202
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.047	0.075	0.062	0.079	0.078	0.093
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.076	0.082	0.093	0.097	0.109	0.109
$\hat{q}$	2.69	1.89	2.41	1.75	2.14	1.63

D. $γ_{a} = 0.3$
	$γ_{z} = - 0.3$		$γ_{z} = 0$		$γ_{z} = 0$ . 3
$C o r r (A, Z) = 0$	Model 1^a	Model 2^b	Model 1^a	Model 2^b	Model 1^a	Model 2^b
${\hat{β}}_{a}$	1.02	1.16	1.03	1.15	1.02	1.14
${\hat{β}}_{z}$	0.38	0.46	0.47	0.54	0.55	0.61
$\hat{β_{0}}$	−1.69	−1.66	−1.68	−1.65	−1.68	−1.65
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.122	0.150	0.128	0.155	0.133	0.158
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.175	0.182	0.177	0.184	0.179	0.186
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.053	0.067	0.067	0.079	0.079	0.090
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.070	0.077	0.085	0.089	0.098	0.101
$\hat{q}$	1.85	1.45	1.76	1.40	1.69	1.37
True values:	$P [M_{i} = 1] = 0.197, E [U M E_{i} (Z)] = 0.093$

All means conditional on $R_{i} = 1$ are reweighted using inverse propensity scores.

a $α = - 1, ρ = 0$ in model 1; b $α = 0, ρ = - 0.5$ in model 2.

If the adjusted estimates of the unconditional quantities based on the reweighted distribution of $A_{i}$ and $Z_{i} (i . e .,$ $m e a n [{\hat{P}}_{M i} | R_{i} = 1] and$ $m e a n [{\hat{U M E}}_{i} | R_{i} = 1])$ are closer to their corresponding true unconditional value than the conditional estimates $(m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]$ and mean[ ${\hat{C M E}}_{i} | R_{i} = 1],$ respectively), the adjustment procedure can be considered to be superior to using the conditional estimates. Among the estimates illustrated in Table 3, there are three cases of Model 2 in which mean[ ${\hat{C M E}}_{i} | R_{i} = 1]$ is closer to the true value than mean[ ${\hat{U M E}}_{i} | R_{i} = 1]$ , all of which have $γ_{z} = 0.3,$ and, perforce, that will also be the case for $γ_{z} > 0.3.$ Panel D eliminates the correlation between A and Z, but the conclusions about which estimator of the marginal effect is closer to the true effect are not affected (cf. panels A and D).

Effectiveness of the Adjustment Procedure for Further Variation in Model Specification

Variants of Model 1

This section reports analysis of the estimates of the conditional event probability and average conditional marginal effect along with the estimates of their unconditional counterparts, which use the adjustment procedure. Thirty-six different parameter configurations for the panel retention equation of Model 1 are considered: $γ_{a} and γ_{z}$ each take on three possible values (−0.3, 0 and 0.3) and $α$ takes on four possible values (−0.1, −0.5, −0.75 and −1). In all cases, the correlation between A and Z is assumed to be 0.3, and the estimates of the means are reweighted using the inverse propensity score. The results are used to explore further the effectiveness of the adjustment procedure.

In every case, $m e a n [{\hat{P}}_{M i} | R_{i} = 1]$ is closer to the true unconditional mean probability then the conditional mean $(m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1])$ . This is not surprising because we have estimated $\hat{q}$ to minimize the distance between the conditional and unconditional marginal probabilities with respect to age. The adjustment of $\hat{P} [M_{i} = 1 | R_{i} = 1]$ using $\hat{q}$ partially corrects for the severe downward bias of the estimate of the $β_{0}$ from the conditional sample.

Although $m e a n [{\hat{U M E}}_{i} | R_{i} = 1])$ is closer to the true unconditional marginal effect than mean[ ${\hat{C M E}}_{i} | R_{i} = 1]$ in most of the 36 cases, it is not for 6 cases for which $γ_{z} = 0.3$ and $α < - 0.1$ and $γ_{a} \leq 0.$ In this range of $α$ and $γ_{a},$ this would also be true for $γ_{z} > 0.3.$

Figure 4 plots the two estimators against different values of $α$ in Model 1 for two sets of gamma parameters: in each $γ_{z} = 0.3$ while $γ_{a} = - 0.3 or 0.3.$ Although mean[ ${\hat{C M E}}_{i} | R_{i} = 1]$ is similar for each set of gamma parameters, the adjustment procedure produces a much larger adjustment factor $\hat{q}$ for the set with $γ_{a} = - 0.3,$ so that $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ is farther from the true value than the unadjusted conditional marginal effect in this case and closer to it when $γ_{a} = 0.3$ . As $α$ declines the downward bias of mean[ ${\hat{C M E}}_{i} | R_{i} = 1]$ increases (becomes “more negative”), while the adjustment factor $\hat{q}$ increases. The difference between $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ and mean[ ${\hat{C M E}}_{i} | R_{i} = 1]$ is the outcome of the “race” between these two factors as $α$ declines.

Figure 4.

Conditional marginal effect of Z and adjusted effect using $\hat{q}$ in Model 1.

Distance is defined as the absolute value of the difference between each estimator and the true value of the average marginal effect of Z. Analysis of the distances in the 36 simulations indicates that the distance for the reweighted mean[ ${\hat{C M E}}_{i} (Z) | R_{i} = 1]$ declines with increases in the values of $α$ and $γ_{z}$ . This is also true for the reweighted $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1]$ for which the relationship partly operates through the adjustment factor function $\hat{q} (γ_{a}, γ_{z}, α$ ), which is decreasing in all three panel retention parameters. Although the distances for both estimators decline with increases in $γ_{a}$ , their effect on distance is about the same for both.

Define the difference between the distances of the two estimators as D, so that they are equidistant when D = 0 and D < 0 indicates that mean[ ${\hat{C M E}}_{i} (Z) | R_{i} = 1]$ is closer to the true value than $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1] .$ There is a relationship between $α$ and $γ_{z}$ for which D = 0 which is illustrated in Figure 5. Combinations of $α$ and $γ_{z}$ above the curve have D < 0, for which mean[ ${\hat{C M E}}_{i} (Z) | R_{i} = 1]$ is the superior estimator. For any given $α$ , this is more likely to occur when $γ_{z}$ is larger. The value of $γ_{a}$ has little impact on the position of the $α$ - $γ_{z}$ relationship.

Figure 5.

Equidistant estimators’ curve*.

Variants of Model 2

A similar exercise was carried out for variation in $γ_{a}, γ_{z} and ρ .$ Again the two gamma parameters take on 3 values each (−0.3, 0 and 0.3) and $ρ$ takes on 3 values: −0.1, −0.25, and −0.5. Among these 27 different parameter configurations for the retention equation, $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ is the superior estimator in cases $γ_{z} = 0.3 and γ_{a} = 0 or γ_{a} = - 0.3,$ and this will also be true when $γ_{z} > 0.3.$ What emerges is a curve relating $ρ$ and $γ_{z}$ along which D = 0, similar to that in Figure 5 for $α$ and $γ_{z}$ .⁹ In this model, a larger value of $γ_{a}$ shifts the $ρ$ - $γ_{z}$ relationship upward so that for each value of $ρ$ it requires a larger value of $γ_{z}$ to make D = 0; put differently, it takes a higher $γ_{z}$ to make $m e a n [{\hat{C M E}}_{i} | R_{i} = 1]$ the superior estimator. As with Model 1 in relation to $α$ , by biasing the conditional estimates of $β_{z}$ upwards by more, a higher $γ_{z}$ can offset the downward bias in $β_{z}$ caused by a smaller $ρ$ (or $α$ ) thereby producing the same distance from the true marginal effect of Z.

Claims and Conjectures

Of course, there are limits to what we can learn from simulations of variants of a particular model of panel retention like that used above. But there are lessons which aid our understanding of the proposed adjustment procedure. No claim can be made that, even after reweighting, the estimates $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ and $m e a n [{\hat{P}}_{M i} | R_{i} = 1]$ using $\hat{q}$ are consistent estimates of $E [U M E_{i}]$ or mean $P [M_{i} = 1]$ , respectively. But the simulations point to the following conjectures:

The estimator $m e a n [{\hat{P}}_{M i} | R_{i} = 1]$ is always closer to the true mean value of $P [M_{i} = 1]$ than $m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]] .$

When the event is weakly associated with panel retention, the two estimators of the marginal effect are very similar and $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ is usually closer to the true value than $m e a n [{\hat{C M E}}_{i} | R_{i} = 1]$ .

When the event is strongly associated with panel retention, $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ is superior unless the exogenous covariates ( $X_{i}$ ) strongly influence retention in the same direction as they affect M_i.

Information on the direction and strength of the exogenous covariates ( $X_{i})$ on retention is obtained from estimates of the propensity score equation $\hat{P} [R_{i} = 1 | X_{i}],$ and the equation provides weights for calculating means. In the context of the Models 1 and 2, a weak or opposite impact of $Z_{i}$ on $R_{i}$ from that on $M_{i}$ would strongly favor the superiority of $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ over the simple conditional estimates. In any case, it would be worth reporting both estimators of the unconditional marginal effect: $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ as well as $m e a n [{\hat{C M E}}_{i} | R_{i} = 1]$ .

Examples

In both examples, the survey data is Understanding Society (the UK Household Longitudinal Study). It is a longitudinal survey of the members of approximately 40,000 households in the United Kingdom. Households recruited at the first round of data collection (2009–2011) were visited each year to collect information on changes to their household and individual circumstances. Annual interviews were conducted face-to-face in respondents’ homes by trained interviewers. All members of the households selected at the first wave and their descendants, who become full members of the panel when they reach age 16, constitute the core sample who are followed wherever they move within the UK. All others who join their households in subsequent waves do not become part of the core sample, but they are interviewed as long as they live with at least one core sample member. Thus, the sample is refreshed with younger members annually. Understanding Society is designed to be representative of the UK population at each wave, representing all ages and all educational and social backgrounds (for more details see Understanding Society 2021a, 2021b).

In the residential mobility example, the aim is to estimate the impact of the housing tenure in the household in which a person lives on age-specific movement rates (i.e., the counterpart of Z in the model of the previous section), and in the marriage example, the objective is to estimate the impact of existing children on age-specific marriage rates. In both examples, the external population data has the same target probability as the survey data: the annual age-group specific probability of the event (residential movement or marriage). In Understanding Society, the estimates are based on residential movement (marriage) between annual waves.

Residential Mobility

As mentioned often, the study of the probability of moving house is likely to be particularly prone to downward bias from panel attrition. The example uses population data on internal movements within the United Kingdom during the past year derived from the 2011 Census (the last one for which movement data is currently available). Movement is defined as those who were enumerated as having lived elsewhere one year ago within the United Kingdom. The true mobility rate is calculated by dividing this number by the sum of it and the number who lived at the same address in 2011 as one year ago. The rates are computed by sex and age groups and are shown in Figure 6. The corresponding survey data from Understanding Society measures residential mobility as changes in address between their previous annual wave of the survey and the current wave among those interviewed in 2011. These mobility rates are also shown in Figure 6, from which we see that the survey understates mobility among those aged under 50, probably because of larger panel attrition among movers.

Figure 6.

Comparison of Census and Understanding Society internal migration rates 2011.

To estimate $\hat{q}$ , the first step was to estimate the proportion of moving house in the previous year among those interviewed in 2011 in Understanding Society for the seven age groups in Figure 6 for each gender, which are conditional on panel retention. Next, perform non-linear least squares estimation of equation (3) over these seven observations. For men, we estimate $\hat{q}$ to be 1.47 (SE = 0.08) and for women 1.64 (SE = 0.17).¹⁰

Because our estimate of $P [M_{i} = 1 | R_{i} = 1, a g e g r o u p]$ from the survey data is subject to sampling error, estimating equation (3) may produce errors-in-variables bias in $\hat{q}$ . It is therefore preferable to apply non-linear least squares to equation $(3^{'})$ which arises from re-writing equation (3):

P [M_{i} = 1 | R_{i} = 1, X_{i}]] = \frac{P [M_{i} = 1 | X_{i}]}{q_{i} + P [M_{i} = 1 | X_{i}] (1 - q_{i})}

(3^\prime)

Using this approach for men, $\hat{q}$ was estimated to be 1.48 (SE = 0.07) and for women 1.70 (SE = 0.16). If the true annual sex- and age-specific mobility rates can be assumed to be the same over the decade 2011–2021 as in 2011, we can compare the 2011 Census mobility rates with mobility rates calculated by pooling all 11 waves of Understanding Society. The estimates of $\hat{q}$ are then 1.46 (SE = 0.15) for men and 1.54 (SE = 0.21) for women. To be conservative, these are the ones used in the adjustment procedure, but there are clearly relatively wide confidence intervals around these estimates.

A probit model for residential mobility was estimated using all eleven waves of data (i.e., up to ten pairs of waves) among persons aged 17–49. In the model, mobility depends on a cubic in age, highest educational level, housing tenure (private rental and homeowner cf. social tenancy), an interaction between gender and the presence of a partner (all measured in the previous year) and interview year. In broad terms, the model is similar to those estimated for the UK in Ermisch and Mulder (2019) and Ermisch and Steele (2016). The discussion focuses on the mean probability of the event and the average marginal effect of a person's housing tenure (i.e., treated as Z in the section “Artificial Data”) on the probability of mobility. Table 4 shows the conditional probabilities and marginal effects estimated from the model along with the corresponding estimates of their unconditional counterparts using the adjustment procedure.

Table 4.

Estimates of Conditional and Unconditional Probabilities Residential Mobility and Marginal Effects, People Aged 16–49 (Robust Standard Error in Parentheses).

	Men		Women
	Unweighted	Weighted	Unweighted	Weighted
$\hat{P} [M_{i} = 1 \| R_{i} = 1]$	0.114 (0.001)	0.123	0.103 (0.001)	0.109
$m e a n [{\hat{P}}_{M i} \| R_{i} = 1]$	0.152	0.163	0.144	0.152
2011 Census $P [M_{i} = 1$ ]	0.170		0.164
Private tenant^a
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	0.146 (0.004)	0.155	0.141 (0.004)	0.145
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	0.188	0.191	0.185	0.187
Homeowner^a
mean[ $\hat{C M E} (Z)_{i} \| R_{i} = 1$ ]	−0.019 (0.002)	−0.020	−0.018 (0.002)	−0.018
$m e a n [{\hat{U M E (Z)}}_{i} \| R_{i} = 1]$	−0.026	−0.027	−0.025	−0.026
$\hat{q}$	1.46		1.54

^aCompared with social tenant, calculated from discrete probability differences (i.e., not based on equation (4)).

First, reweighting using the inverse of the estimated propensity score increases the size of the conditional probability of the event by a small amount. This means that, on average, observed influences on panel retention tend to reduce the estimate of the conditional probability and so reweighting addresses this to some extent. The association of the event itself with panel retention is captured by the upward adjustment using $\hat{q}$ : compare $m e a n [{\hat{P}}_{M i} | R_{i} = 1]$ with $m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]] .$ This adjustment brings us closer to the true value of mean $P [M_{i} = 1]$ than $m e a n [\hat{P} [M_{i} = 1 | R_{i} = 1]]$ , but it is still below it (by 0.07 for men and 0.12 for women).

Reweighting also increases the size of the average conditional marginal effect of housing tenure, $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1],$ by a small amount. Using the adjustment procedure to obtain $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1]$ raises the size by a relatively large amount. Because the propensity score for panel retention is correlated with the housing tenure variables in the opposite direction to the sign of $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ in each case we can be fairly confident that $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1]$ does not overstate the true average marginal effect and is closer to the true effect than $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ .

The housing tenure a person lived in during the previous year is not available in the Census data and so there is no true value to which we could compare these estimates. A rough guide is the difference in the mobility rate by housing tenure in the year after any move for whole households moving in the 2011 Census: homeowners had a rate 0.049 lower than social tenants and private tenants had one 0.160 higher than social tenants. The model estimates, which control for other variables, are, therefore, of the right order of magnitude.

Marriage Using Marriage Registration Data

What may distinguish marriage from residential mobility in terms of panel retention is that while marriage often involves residential mobility, that is not always the case because the couple may be living together already. Marriage may affect panel retention in at least two ways: (1) residential mobility associated with marriage creates a tendency for those who marry to be more likely to drop out of the panel and (2) people who marry may be different in unobserved ways (e.g., possibly in terms of making commitments and responsibility), which makes their continued participation in the panel study more likely, implying fewer panel dropouts among the those who marry.

Figure 7 shows the comparison between men's age-specific average marriage rates for 2010–13 based on registration data and estimates from Understanding Society using waves 1–3.¹¹ The first reason for non-ignorable attrition leads to understatement of marriage rates in the survey data, but the second leads to overstatement of marriage rates in the survey data. Figure 7 suggests the second reason dominates. A similar but smaller difference is evident for women (Appendix Figure A2, supplemental material)

Figure 7.

Marriage rates from registration data (average 2010–13) Understanding Society (waves 1–3) and adjusted Understanding Society rates (waves 1–3), men.

Estimation of $\hat{q}$ is analogous to the previous example using the six age group observations to estimate equation ( $3^{'}$ ). The estimate of $\hat{q}$ is 0.71 (SE = 0.05) for men and 0.87 (SE = 0.03) for women. The survey marriage probability estimates adjusted using $\hat{q}$ are shown in Figure 7 (and Figure A2) as the lines with triangles.

The marriage example addresses a simple substantive question which cannot be addressed with registration data: are people in a cohabiting couple more likely to marry if they have children than if they do not? A logit model for marrying between waves t and t + 1 was estimated, in which there are two sets of regressors: a quadratic in age and three variables for a two-way interaction between having a partner and having their own child(ren) in the household at wave t. The estimation was performed separately by gender.

Among partnered women in the survey data, after reweighting $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ of having a child is −0.023 (SE = 0.009), and $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1]$ is −0.022.¹² Among men, these quantities are −0.024 (SE = 0.010), and −0.018, respectively. Because having a child in the household also reduced panel retention, using $\hat{q}$ to adjust the conditional estimate may “over-adjust”; that is, it is possible that $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ is the superior estimator (closer to the true value than $m e a n [{\hat{U M E}}_{i} (Z) | R_{i} = 1])$ .¹³ This is not a large concern here because the differences between the $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ and the $m e a n [{\hat{U M E}}_{i} | R_{i} = 1]$ are well within one standard error of $m e a n [{\hat{C M E}}_{i} (Z) | R_{i} = 1]$ .

Conclusions

Empirical analysis of variation in demographic events within the population is facilitated by using longitudinal survey data, but there is wave-on-wave dropout. When attrition is related to the event, such as residential mobility, it precludes consistent estimation of the impacts of covariates on the event, on event probabilities and on statistics based on these probabilities in the absence of additional, unverifiable assumptions. The paper introduced an adjustment procedure based on Bayes Theorem that uses population information external to the survey sample to convert estimates of event probabilities and marginal effects of covariates on them that are conditional on retention in the longitudinal data to unconditional estimates of these quantities.¹⁴ It does not produce consistent estimates of the unconditional quantities, but (1) its estimate of the mean unconditional probability of the event is always closer to the true value than the conditional estimate, with the adjustment factor providing a measure of how close the mean unconditional probability of the event is to its conditional estimate; and (2) estimation of the impacts of covariates on panel retention provides information on whether the adjusted estimates of the marginal effects of covariates are closer to the true unconditional values than the corresponding conditional estimates. The process of obtaining the adjusted estimator and the estimation of the propensity score equation reveals valuable information about the survey data relative to corresponding population data and whether attrition is ignorable or not, and reporting both estimates is recommended.

The adjustment method was applied to estimate the variation in residential mobility and marriage rates within the population in relation to covariates. The two sources of external population data are census and marriage registration statistics, respectively. In each case, the survey data was the large UK household panel survey called Understanding Society. In the residential mobility analysis, the conditional estimates understate the unconditional mean event probability substantially and the relationship between panel retention and the covariates is such that the adjusted estimates of the marginal effects are larger than the conditional estimate and very likely to be superior to the conditional ones. In the marriage analysis, the opposite is the case: the conditional mean probability overstates the unconditional one and the conditional estimate of the marginal effects is superior to the adjusted estimate, although similar.

Footnotes

Acknowledgements

I am grateful for the helpful comments from two reviewers and the editor.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was funded by a Leverhulme Trust Grant for the Leverhulme Centre for Demographic Science, University of Oxford.

ORCID iD

John Ermisch

Data Availability Statement

Stata programs, which create the data and do the analysis, along with a ReadMe file describing the program files and their application, are available as a zip file from Open Science Framework at .

Supplemental Material

Supplemental material and Appendix for this article are available online.

Notes

Author Biography

John Ermisch is emeritus professor of family demography at the University of Oxford, a senior research fellow at Nuffield College, a Fellow of the British Academy (since 1995) and an associate of the Leverhulme Centre for Demographic Science. He is the author of An Economic Analysis of the Family (Princeton University Press, 2003), Lone Parenthood: An Economic Analysis (Cambridge University Press, 1991) and The Political Economy of Demographic Change (Heinemann, 1983), as well as numerous articles in economic, sociology and demographic journals. He is co-editor of From Parents to Children: The Intergenerational Transmission of Advantage (New York: Russell Sage Foundation, 2012). He is Editor in Chief of Population Studies.

References

Bareinboim

Elias

Tian

Jin

Pearl

Judea

. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” Pp. 2410–6 in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.

Chaudhuri

Sanjay

Handcock

Mark S.

Rendall

Michael S.

. 2008. “Generalized Linear Models Incorporating Population Level Information: An Empirical-Likelihood-Based Approach.” Journal of the Royal Statistical Society B 70(2):311–28.

Ermisch

John.

2023. “The Recent Decline in Period Fertility in England and Wales: Differences Associated with Family Background and Intergenerational Educational Mobility.” Population Studies: 1–15. doi:10.1080/00324728.2023.2215224.

Ermisch

John

Mulder

Clara

. 2019. “Migration Versus Immobility and Ties to Parents.” European Journal of Population 35(3):587–608. doi:10.1007/s10680-018-9494-0.

Ermisch

John

Steele

Fiona

. 2016. “Fertility Expectations and Residential Mobility in Britain.” Demographic Research 35(article 54):1561–84. doi:10.4054/DemRes.2016.35.54.

Handcock

Mark S.

Huovilainen

Sami M.

Rendall

Michael S.

. 2000. “Combining Registration-System and Survey Data to Estimate Birth Probabilities.” Demography 37(2):187–92.

Handcock

Mark S.

Rendall

Michael S.

Cheadle

Jacob E.

. 2005. “Improved Regression Estimation of a Multivariate Relationship with Population Data on the Bivariate Relationship.” Sociological Methodology 35:291–334.

Heckman

James J.

1979. “Selection Bias as a Specification Error.” Econometrica 47(1):153–61.

Imbens

G. W.

Lancaster

. 1994. “Combining Micro and Macro Data in Microeconometric Models.” The Review of Economic Studies 61(4):655–80.

10.

Little, R. J. and D. Rubin. 2014. Statistical Analysis with Missing Data. New York: Wiley.

11.

Mohan

Pearl

. 2021. “Graphical Models for Processing Missing Data.” Journal of the American Statistical Association 116(534):1023–37.

12.

Rendall

Michael S.

Handcock

Mark S.

Jonsson

Stefan H.

. 2009. “Bayesian Estimation of Hispanic Fertility Hazards from Survey and Population Data.” Demography 46(1):65–83.

13.

Washbrook

Clarke

P. S.

Steele

. 2014. “Investigating Non-Ignorable Dropout in Panel Studies of Residential Mobility.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 63(2):239–66.

14.

Understanding Society. 2021a. About the Study. https://www.understandingsociety.ac.uk/about/about-the-study

15.

Understanding Society. 2021b. Main Survey User Guide. https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/