Fast Bayesian Estimation for the Four-Parameter Logistic Model (4PLM)

Abstract

There is a rekindled interest in the four-parameter logistic item response model (4PLM) after three decades of neglect among the psychometrics community. Recent breakthroughs in item calibration include the Gibbs sampler specially made for 4PLM and the Bayes modal estimation (BME) method as implemented in the R package mirt. Unfortunately, the MCMC is often time-consuming, while the BME method suffers from instability due to the prior settings. This paper proposes an alternative BME method, the Bayesian Expectation-Maximization-Maximization-Maximization (BE3M) method, which is developed from by combining an augmented variable formulation of the 4PLM and a mixture model conceptualization of the 3PLM. The simulation shows that the BE3M can produce estimates as accurately as the Gibbs sampling method and as fast as the EM algorithm. A real data example is also provided.

Keywords

4PLM mixture modeling Gibbs sampler BME BE3M

The four-parameter logistic item response model (4PLM) was first mentioned as far back as McDonald (1967) and formally proposed by Barton and Lord (1981). Despite this, for many years, it received very little attention in the literature due to doubts about its utility, as well as technical difficulties related to parameter estimation. Recently, however, the psychometrics community has rekindled its interest in the 4PLM after three decades of neglect.

Increasingly, applications of the 4PLM have appeared in the literatures as supporting evidence of its usefulness. Several studies argued that the slipping parameter from the 4PLM can be estimated to account for early careless errors to improve ability estimation (Liao et al., 2012; Rulison & Loken, 2009). In psychology, it has been noted that there is value in modeling the lower and upper asymptotes of item response functions (Reise & Waller, 2003; Waller & Reise, 2010) in instances where subjects with higher levels of psychopathology may be reluctant to self-report attitudes, behaviors, and/or experiences which may indicate a social desirability bias (Rouse et al., 1999). In education, recent studies have shown the prevalence and implication of the slipping effect in low-stakes large-scale educational assessments (Culpepper, 2016, 2017).

Despite this work, one daunting challenge regarding the 4PLM pertains to the difficulty in estimating item parameters. The lack of a fast and stable implementation of the 4PLM remains a major barrier to its widespread application in practice. Few studies to date have addressed the item calibration issue for the 4PLM. All the methods fall into two approaches: MCMC methods and marginal maximum likelihood (Feuerstahler & Waller, 2014). For MCMC methods, Loken and Rulison (2010) presented Bayesian model formulations with Metropolis-Hastings (MH) sampling using the general-purpose software (e.g., “bayesmh” module in Stata). More recently, Culpepper (2016) proposed a flexible Gibbs sampling method. Both MCMC methods can provide accurate item estimates, but are inherently time-consuming. Even for Culpepper’s (2016, 2017) Gibbs sampling approach, calibration is expected to take at least 1 hour, which also motivates us to find a faster Bayesian estimation.

Another recent implementation of the 4PLM was done by Waller and Feuerstahler (2017). In their work, they extended the Bayesian modal estimation method (BME) to the 4PLM using the R package mirt (Chalmers, 2012). Compared to MCMC methods, BME is a fast calibration method, but several issues need to be explored to evaluate the utility of BME and its implementation in the mirt package. For example, Guo and Zheng (2019) found that the item parameter estimates yielded by BME were unstable for the 3PLM when changing priors of guessing parameters. Moreover, in the mirt package of version 1.30, the standard errors of all item parameters cannot be calculated as a consequence of program errors in computing the Hessian matrix when a Beta distribution prior (“expbeta” in mirt) is imposed on guessing or slipping parameters.

At present, we propose an alternative Bayesian modal estimation approach that we term Bayesian Expectation-Maximization-Maximization-Maximization (BE3M) method, which is developed by using a latent variable augmentation approach for the 4PLM in conjunction with the mixture modeling approach for the 3PLM (Zheng et al., 2017). The remainder of the manuscript is as follows: First, we present a detailed derivation of the BME and discuss potential difficulties in item calibration. Second, we present a latent variable augmentation approach to formulating the 4PLM (Béguin & Glas, 2001; Culpepper, 2016), and the resulting BE3M estimation method. Third, we present two simulation studies assessing the accuracy of parameter recovery for the four calibration methods (BE3M, mirt, fourPNO, Stata). Fourth, we apply the four methods to bullying item responses of 7,491 adolescents from the 2005 to 2006 Health Behavior in School-Aged Children (HBSC) study. Lastly, we discuss future research directions and concluding remarks.

Bayesian Modal Estimation for the 4PLM

Following Culpepper (2016), the slope-intercept form IRT formulation of the 4PLM is:

P (Y_{i j} = 1 | θ_{i}, ψ_{j}) \equiv P_{j} (θ_{i}) = γ_{j} + (1 - ς_{j} - γ_{j}) P_{j}^{*} (θ_{i}),

(1)

with

P_{j}^{*} (θ_{i}) = \frac{1}{1 + \exp (- D (α_{j} θ_{i} - β_{j}))},

(2)

where $Y_{i j}$ is the observed score of examinee $i$ on item $j$ ; $α_{j}$ , $β_{j}$ , $γ_{j}$ , and $ς_{j}$ are item slope, intercept, guessing, and slipping parameter of item $j (j = 1, 2, 3 \dots, J)$ , respectively; $θ_{i} \in θ$ is the ability parameter of examinee $i (i = 1, 2, 3 \dots, N)$ ; $D$ is a constant, equals 1.702. Let $ψ_{j} = {(α_{j}, β_{j}, γ_{j}, ς_{j})}^{T}$ be the $j$ th column vector of the matrix $Ψ$ collecting the item parameters for all J items. Similarly, let $y_{i}$ be the row vector of the matrix $Y$ collecting all item responses of examinee $i$ . Then, the marginal likelihood function should be

\begin{array}{l} L (Y | Ψ) = \prod_{i = 1}^{N} P (y_{i} | Ψ) \\ = \prod_{i = 1}^{N} \int_{θ_{i}} P (y_{i} | θ_{i}, Ψ) g (θ_{i} | τ) d θ_{i} \\ = \prod_{i = 1}^{N} \int_{θ_{i}} \prod_{j = 1}^{J} [P_{j} {(θ_{i})}^{Y_{i j}} {(1 - P_{j} (θ_{i}))}^{1 - Y_{i j}}] g (θ_{i} | τ) d θ_{i} \end{array}

(3)

with

P (y_{i} | Ψ) = \int_{θ_{i}} P (y_{i} | θ_{i}, Ψ) g (θ_{i} | τ) d θ_{i},

(4)

P (y_{i} | θ_{i}, Ψ) = \prod_{j = 1}^{J} P_{j} {(θ_{i})}^{Y_{i j}} {(1 - P_{j} (θ_{i}))}^{1 - Y_{i j}},

(5)

where $g (θ_{i} | τ)$ is a density function for each examinee $i$ , and $τ$ is the vector containing the parameters of the examinee population ability distribution. Please refer to Supplemental Appendix A for a detailed derivation. It is worth pointing out that the BME is a natural extension of the one for 3PLM, but that the Fisher information matrix is slightly larger—it is size 4 × 4 rather than size 3 × 3—with small diagonal elements for the guessing and slipping parameters. Taken together, we find a great deal of numerical instability in the calculation of the Fisher information matrix, resulting in the BME requiring a much larger sample size than the 3PLM to avoid convergence problems. According to Waller and Feuerstahler (2017), the convergence rate for a sample of 1,000, which is always feasible for 3PLM, is only 9%. The new method in the current paper can address this issue by reformulating the 4PLM as a latent mixture model.

The Latent Mixture Modeling Reformulation of the 4PLM

From inspection, we can determine that the 4PLM can be regarded as a latent mixture model. First, note that

\begin{array}{l} P_{j} (θ_{i}) = γ_{j} + (1 - ς_{j} - γ_{j}) \times P_{j}^{*} (θ_{i}) \\ = P_{j}^{*} (θ_{i}) \times (1 - ς_{j}) + (1 - P_{j}^{*} (θ_{i})) \times γ_{j} . \end{array}

(6)

Next, we define a latent indicator variable $W_{i j}$ to indicate whether examinee $i$ knows the answer to item $j$ , where $W_{i j} = 1$ if he knows it and 0 if he does not (Béguin & Glas, 2001; Culpepper, 2016). Furthermore, we model $W_{i j}$ probabilistically as $W_{i j} ~ B e r n o u l l i (P_{j}^{*} (θ_{i})),$ meaning that $P (W_{i j} = 1) = P_{j}^{*} (θ_{i})$ .

From the mixture-modeling perspective, depending on the value of $W_{i j}$ , the 4PLM can be decomposed into two distinct parts: $1 - ς_{j}$ and $γ_{j}$ . We can further obtain the conditional probabilities $P (Y_{i j} | W_{i j}, θ_{i}, ψ_{j})$ as:

\begin{array}{l} P (Y_{i j} = 1 | W_{i j} = 1, θ_{i}, ψ_{j}) = 1 - ς_{j}, \\ P (Y_{i j} = 0 | W_{i j} = 1, θ_{i}, ψ_{j}) = ς_{j}, \\ P (Y_{i j} = 1 | W_{i j} = 0, θ_{i}, ψ_{j}) = γ_{j}, \\ P (Y_{i j} = 0 | W_{i j} = 0, θ_{i}, ψ_{j}) = 1 - γ_{j} . \end{array}

(7)

Thus, we find the joint distribution of $Y_{i j} = 1$ and $W_{i j}$ (conditional on item and person parameters) as follows:

\begin{array}{l} P (Y_{i j}, W_{i j} | θ_{i}, ψ_{j}) \\ = {\begin{matrix} (1 - ς_{j}) P_{j}^{*} (θ_{i}), & for Y_{i j} = 1, W_{i j} = 1 \\ ς_{j} P_{j}^{*} (θ_{i}), & for Y_{i j} = 0, W_{i j} = 1 \\ γ_{j} (1 - P_{j}^{*} (θ_{i})), & for Y_{i j} = 1, W_{i j} = 0 \\ (1 - γ_{j}) (1 - P_{j}^{*} (θ_{i})), & for Y_{i j} = 0, W_{i j} = 0 \end{matrix} . \end{array}

(8)

Let $w_{i}$ be the latent indicator vector for examinee i. Then the joint distribution for the new augmented complete data $(Y, W, θ)$ is

P (y_{i}, w_{i}, θ_{i} | Ψ, τ) = P (y_{i}, w_{i} | Ψ, θ_{i}) g (θ_{i} | τ)

(9)

where

\begin{array}{l} P (y_{i}, w_{i} | Ψ, θ_{i}) \\ = \prod_{j = 1}^{J} {[(1 - ς_{j}) P_{j}^{*} (θ_{i})]}^{Y_{i j} W_{i j}} {[γ_{j} (1 - P_{j}^{*} (θ_{i}))]}^{Y_{i j} (1 - W_{i j})} \\ \times {[ς_{j} P_{j}^{*} (θ_{i})]}^{(1 - Y_{i j}) W_{i j}} {[(1 - γ_{j}) (1 - P_{j}^{*} (θ_{i}))]}^{(1 - Y_{i j}) (1 - W_{i j})} . \end{array}

(10)

Just like the MMLE/EM algorithm, the BE3M algorithm also marginalizes out the latent ability variable, so the likelihood function of the BE3M without priors is:

\begin{array}{l} L (Y, W | Ψ) = \prod_{i = 1}^{N} P (y_{i}, w_{i} | Ψ) \\ = \prod_{i = 1}^{N} \int_{θ_{i}} P (y_{i}, w_{i}, θ_{i} | Ψ, τ) g (θ_{i} | τ) d θ_{i} \\ = \prod_{i = 1}^{N} \int_{θ_{i}} P (y_{i}, w_{i} | Ψ, θ_{i}) g (θ_{i} | τ) d θ_{i} . \end{array}

(11)

This formulation follows Culpepper (2016) for 4PNO. In contrast to his work, however, we now develop a Bayesian modal method to estimate model parameters for the 4PLM. This approach follows the basic premise of an EM algorithm and proposes to split the maximization step into three parts: first maximizing over $γ_{j}$ , then maximizing over $ς_{j}$ , and finally jointly maximizing over $α_{j}$ and $β_{j}$ . We further use prior distributions for the item parameters in the maximization step. As such, we call this approach the Bayesian Expectation-Maximization-Maximization-Maximization (BE3M) method.

Expectation Step and Artificial Data

In the expectation step, we take the expectation of the log-likelihood over the incomplete portion of the data, the latent variables $W_{i j}$ and $θ_{i}$ . In the maximization step, we maximize the posteriors for the item parameters. This maximization process is equivalent to finding the zero of the summation of the first derivatives of the expected log-likelihood and the first derivatives of the priors. Thus, here we find the expected value of the first derivatives of the log-likelihood function. To begin, the first derivatives of the log-likelihood for item j is

\begin{array}{l} \frac{\partial \ln L (Y, W | Ψ)}{\partial ψ_{j}} = \sum_{i = 1}^{N} \frac{\partial \ln P (y_{i}, w_{i} | Ψ)}{\partial ψ_{j}} = \sum_{i = 1}^{N} \frac{1}{P (y_{i}, w_{i} | Ψ)} \frac{\partial P (y_{i}, w_{i} | Ψ)}{\partial ψ_{j}} \\ = \sum_{i = 1}^{N} \frac{1}{P (y_{i}, w_{i} | Ψ)} \int_{θ_{i}} \frac{\partial P (y_{i}, w_{i} | Ψ, θ_{i})}{\partial ψ_{j}} g (θ_{i} | τ) d θ_{i} \\ = \sum_{i = 1}^{N} \frac{1}{P (y_{i}, w_{i} | Ψ)} \int_{θ_{i}} [\frac{\partial \ln P (y_{i}, w_{i} | Ψ, θ_{i})}{\partial ψ_{j}}] P (y_{i}, w_{i} | Ψ, θ_{i}) g (θ_{i} | τ) d θ_{i} \\ = \sum_{i = 1}^{N} \int_{θ_{i}} [\frac{\partial \ln P (y_{i}, w_{i} | Ψ, θ_{i})}{\partial ψ_{j}}] f (θ_{i} | y_{i}, w_{i}, τ, Ψ) d θ_{i} \\ = \sum_{i = 1}^{N} \int_{θ_{i}} \frac{\partial}{\partial ψ_{j}} \sum_{j = 1}^{J} [\begin{matrix} Y_{i j} W_{i j} \ln (1 - ς_{j}) P_{j}^{*} (θ_{i}) + Y_{i j} (1 - W_{i j}) \ln γ_{j} (1 - P_{j}^{*} (θ_{i})) \\ + (1 - Y_{i j}) W_{i j} \ln ς_{j} P_{j}^{*} (θ_{i}) + (1 - Y_{i j}) (1 - W_{i j}) \ln (1 - γ_{j}) (1 - P_{j}^{*} (θ_{i})) \end{matrix}] f (θ_{i} | y_{i}, w_{i}, τ, Ψ) d θ_{i} \\ = \sum_{i = 1}^{N} \int_{θ_{i}} [\begin{array}{l} \frac{W_{i j} - P_{j}^{*} (θ_{i})}{P_{j}^{*} (θ_{i}) [1 - P_{j}^{*} (θ_{i})]} \frac{\partial P_{j}^{*} (θ_{i})}{\partial ψ_{j}} \\ + \frac{(Y_{i j} - γ_{j}) (1 - W_{i j})}{γ_{j} (1 - γ_{j})} \frac{\partial γ_{j}}{\partial ψ_{j}} + \frac{W_{i j} (1 - ς_{j} - Y_{i j})}{ς_{j} (1 - ς_{j})} \frac{\partial ς_{j}}{\partial ψ_{j}} \end{array}] \times f (θ_{i} | y_{i}, w_{i}, τ, Ψ) d θ_{i} \end{array}

(12)

with

f (θ_{i} | y_{i}, w_{i}, τ, Ψ) = \frac{P (y_{i} | Ψ, θ_{i}) g (θ_{i} | τ)}{\int_{θ_{i}} P (y_{i} | Ψ, θ_{i}) g (θ_{i} | τ) d θ_{i}},

(13)

P (y_{i} | Ψ, θ_{i}) = \prod_{j = 1}^{J} P_{j} {(θ_{i})}^{Y_{i j}} {(1 - P_{j} (θ_{i}))}^{1 - Y_{i j}} .

(14)

Notice that the above expression includes the latent variable $W_{i j}$ so when we take the expectation of the log-likelihood, it will be necessary to calculate $E (W_{i j} | Y_{i j}, θ_{i}, ψ_{j})$ . To find this, we first find the conditional distribution of $W_{i j}$ using Bayes’ rule:

\begin{array}{l} P (W_{i j} = 1 | Y_{i j} = 1, θ_{i}, ψ_{j}) \\ = \frac{(1 - ς_{j}) P_{j}^{*} (θ_{i})}{(1 - ς_{j}) P_{j}^{*} (θ_{i}) + γ_{j} (1 - P_{j}^{*} (θ_{i}))} = \frac{(1 - ς_{j}) P_{j}^{*} (θ_{i})}{P_{j} (θ_{i})}, \end{array}

(15)

\begin{array}{l} P (W_{i j} = 1 | Y_{i j} = 0, θ_{i}, ψ_{j}) \\ = \frac{ς_{j} P_{j}^{*} (θ_{i})}{ς_{j} P_{j}^{*} (θ_{i}) + (1 - γ_{j}) (1 - P_{j}^{*} (θ_{i}))} = \frac{ς_{j} P_{j}^{*} (θ_{i})}{1 - P_{j} (θ_{i})} . \end{array}

(16)

Thus, we find the conditional expectation of $W_{i j}$ to be

\begin{array}{l} E (W_{i j} | Y_{i j}, θ_{i}, ψ_{j}) \\ = \frac{(1 - ς_{j}) P_{j}^{*} (θ_{i})}{P_{j} (θ_{i})} Y_{i j} + \frac{ς_{j} P_{j}^{*} (θ_{i})}{1 - P_{j} (θ_{i})} (1 - Y_{i j}) . \end{array}

(17)

Finally, let $X_{k}$ , $k = 1, 2, \dots, q$ , be nodes on the ability scale with associated weights $A (X_{k})$ . Using the Hermite-Gauss quadrature method to approximate the integral, the first derivatives of the expected log-likelihood function for each item is:

E [\frac{\partial \ln L (Y, W | Ψ)}{\partial ψ_{j}}] \approx [\begin{array}{l} \sum_{i = 1}^{N} \sum_{k = 1}^{q} \frac{E (W_{i j} | Y_{i j}, X_{k}, Ψ) - P_{j}^{*} (X_{k})}{P_{j}^{*} (X_{k}) [1 - P_{j}^{*} (X_{k})]} f (X_{k} | y_{i}, w_{i}, τ, Ψ) \frac{\partial P_{j}^{*} (X_{k})}{\partial α_{j}} \\ \sum_{i = 1}^{N} \sum_{k = 1}^{q} \frac{E (W_{i j} | Y_{i j}, X_{k}, Ψ) - P_{j}^{*} (X_{k})}{P_{j}^{*} (X_{k}) [1 - P_{j}^{*} (X_{k})]} f (X_{k} | y_{i}, w_{i}, τ, Ψ) \frac{\partial P_{j}^{*} (X_{k})}{\partial β_{j}} \\ \sum_{i = 1}^{N} \sum_{k = 1}^{q} \frac{(Y_{i j} - γ_{j}) [1 - E (W_{i j} | Y_{i j}, X_{k}, Ψ)]}{γ_{j} [1 - γ_{j}]} f (X_{k} | y_{i}, w_{i}, τ, Ψ) \frac{\partial γ_{j}}{\partial γ_{j}} \\ \sum_{i = 1}^{N} \sum_{k = 1}^{q} \frac{E (W_{i j} | Y_{i j}, X_{k}, Ψ) (1 - ς_{j} - γ_{j})}{ς_{j} [1 - ς_{j}]} f (X_{k} | y_{i}, w_{i}, τ, Ψ) \frac{\partial ς_{j}}{\partial ς_{j}} \end{array}]

(18)

with

\begin{array}{l} f (X_{k} | y_{i}, w_{i}, τ, Ψ) = f (X_{k} | y_{i}, τ, Ψ) \\ = \frac{P (y_{i} | X_{k}, Ψ) A (X_{k})}{\sum_{k = 1}^{q} P (y_{i} | X_{k}, Ψ) A (X_{k})}, \end{array}

(19)

P (y_{i} | X_{k}, Ψ) = \prod_{j = 1}^{J} P_{j} {(X_{k})}^{Y_{i j}} {[1 - P_{j} (X_{k})]}^{1 - Y_{i j}},

(20)

where $f (X_{k} | y_{i}, w_{i}, τ, Ψ)$ is the posterior density of $θ_{i}$ at quadrature point $X_{k}$ . Following Bock and Aitkin (1981), we can define several “artificial data” points for our algorithm:

\begin{array}{l} {\bar{f}}_{k} = \sum_{i = 1}^{N} f (X_{k} | y_{i}, τ, Ψ), {\bar{r}}_{j k} \\ = \sum_{i = 1}^{N} Y_{i j} f (X_{k} | y_{i}, τ, Ψ), {\bar{f}}_{j k}^{(W)} \\ = \sum_{i = 1}^{N} E (W_{i j} | Y_{i j}, X_{k}, Ψ) f (X_{k} | y_{i}, τ, Ψ), \end{array}

(21)

and

{\bar{r}}_{j k}^{(W)} {= \sum}_{i = 1}^{N} Y_{i j} E (W_{i j} | Y_{i j}, X_{k} Ψ) f (X_{k} | y_{i}, τ, Ψ) .

(22)

By taking their expectations, we find that these expressions have very intuitive interpretations:

$E ({\bar{f}}_{k}) = {\bar{f}}_{k}$ is the expected number of persons (out of N) with ability $X_{k}$ ;

$E ({\bar{r}}_{j k}) = P_{j} (X_{k}) {\bar{f}}_{k}$ is the expected number of persons with ability $X_{k}$ to answer item j correctly;

$E ({\bar{f}}_{j k}^{(W)}) = P_{j}^{*} (X_{k}) {\bar{f}}_{k}$ is the expected number of persons with ability $X_{k}$ to know the answer to item j; and

$E ({\bar{r}}_{j k}^{(W)}) = (1 - ς_{j}) P_{j}^{*} (X_{k}) {\bar{f}}_{k}$ is the expected number of persons with ability $X_{k}$ that know the answer to item j and also answer it correctly.

Maximization Step 1: $γ_{j}$ -Parameters

Let $λ_{γ_{j}}$ be the summation of the expected first derivatives of the log-likelihood with respect to $γ_{j}$ and the first derivatives of the prior distribution of $γ_{j}$ . Following usual practice (Baker & Kim, 2004), we use a beta prior with hyperparameters $υ_{j}^{(γ)}$ and $ρ_{j}^{(γ)}$ . Then,

\begin{array}{l} λ_{γ_{j}} = \frac{\partial \ln E (L (Y, W | Ψ))}{\partial γ_{j}} \\ + \frac{\partial \ln g (γ_{j} | υ_{j}^{(γ)}, ρ_{j}^{(γ)})}{\partial γ_{j}} \\ \approx \frac{υ_{j}^{(γ)} - 1}{γ_{j}} - \frac{ρ_{j}^{(γ)} - 1}{1 - γ_{j}} \\ + \sum_{i = 1}^{N} \sum_{k = 1}^{q} [\frac{(Y_{i j} - γ_{j}) (1 - E (W_{i j} | Y_{i j}, X_{k}, Ψ))}{γ_{j} (1 - γ_{j})}] \\ f (X_{k} | y_{i}, w_{i}, τ, Ψ) \\ = \frac{(υ_{j}^{(γ)} - 1) (1 - γ_{j}) - (ρ_{j}^{(γ)} - 1) γ_{j}}{γ_{j} (1 - γ_{j})} \\ + \sum_{k = 1}^{q} [\frac{{\bar{r}}_{j k} - γ_{j} {\bar{f}}_{k} - {\bar{r}}_{j k}^{(W)} + γ_{j} {\bar{f}}_{j k}^{(W)}}{γ_{j} (1 - γ_{j})}] . \end{array}

(23)

By setting $λ_{γ_{j}}$ equal to 0 and solving for $γ_{j}$ , we get a closed-form solution for the guessing parameter:

γ_{j} = \frac{υ_{j}^{(γ)} - 1 + \sum_{k = 1}^{q} ({\bar{r}}_{j k} - {\bar{r}}_{j k}^{(W)})}{υ_{j}^{(γ)} + ρ_{j}^{(γ)} - 2 + \sum_{k = 1}^{q} ({\bar{f}}_{k} - {\bar{f}}_{j k}^{(W)})} .

(24)

Standard errors will require the expected second derivative as well, which is

\begin{array}{l} λ_{γ_{j} γ_{j}} = \frac{\partial}{\partial γ_{j}} λ_{γ_{j}} \\ = - \frac{υ_{j}^{(γ)} - 1 + \sum_{k = 1}^{q} ({\bar{r}}_{j k} - {\bar{r}}_{j k}^{(W)})}{γ_{j}^{2}} \\ - \frac{ρ_{j}^{(γ)} - 1 + \sum_{k = 1}^{q} ({\bar{f}}_{k} - {\bar{f}}_{j k}^{(W)} - {\bar{r}}_{j k} + {\bar{r}}_{j k}^{(W)})}{{(1 - γ_{j})}^{2}} . \end{array}

(25)

Maximization Step 2: $ς_{j}$ -Parameters

Let $λ_{ς_{j}}$ be the summation of the expected first derivatives of the log-likelihood with respect to $ς_{j}$ and the first derivatives of the prior distribution of $ς_{j}$ . As with the $γ_{j}$ -parameters, we use a beta prior with hyperparameters $υ_{j}^{(ς)}$ and $ρ_{j}^{(ς)}$ . Then,

\begin{array}{l} λ_{ς_{j}} = \frac{\partial \ln E (L (Y, W | Ψ))}{\partial ς_{j}} + \frac{\partial \ln g (ς_{j} | υ_{j}^{(ς)}, ρ_{j}^{(ς)})}{\partial ς_{j}} \\ \approx \frac{υ_{j}^{(ς)} - 1}{ς_{j}} - \frac{ρ_{j}^{(ς)} - 1}{1 - ς_{j}} \\ + \sum_{i = 1}^{N} \sum_{k = 1}^{q} [\frac{E (W_{i j} | Y_{i j}, X_{k}, Ψ) (1 - ς_{j} - Y_{i j})}{ς_{j} (1 - ς_{j})}] \\ f (X_{k} | y_{i}, w_{i}, τ, Ψ) \\ = \frac{(υ_{j}^{(ς)} - 1) (1 - ς_{j}) - (ρ_{j}^{(ς)} - 1) ς_{j}}{ς_{j} (1 - ς_{j})} \\ + \sum_{k = 1}^{q} [\frac{(1 - ς_{j}) {\bar{f}}_{j k}^{(W)} - {\bar{r}}_{j k}^{(W)}}{ς_{j} (1 - ς_{j})}] . \end{array}

(26)

By setting $λ_{ς_{j}}$ equals to 0 and solving for $ς_{j}$ , we get a closed-form solution for the slipping parameter:

ς_{j} = \frac{υ_{j}^{(ς)} - 1 + \sum_{k = 1}^{q} ({\bar{f}}_{j k}^{(W)} - {\bar{r}}_{j k}^{(W)})}{υ_{j}^{(ς)} + ρ_{j}^{(ς)} - 2 + \sum_{k = 1}^{q} {\bar{f}}_{j k}^{(W)}} .

(27)

Standard errors will require the expected second derivatives as well, which is

\begin{array}{l} λ_{ς_{j} ς_{j}} = \frac{\partial}{\partial ς_{j}} λ_{ς_{j}} = - \frac{υ_{j}^{(ς)} - 1 + \sum_{k = 1}^{q} ({\bar{f}}_{j k}^{(W)} - {\bar{r}}_{j k}^{(W)})}{ς_{j}^{2}} \\ - \frac{ρ_{j}^{(ς)} - 1 + \sum_{k = 1}^{q} {\bar{r}}_{j k}^{(W)}}{{(1 - ς_{j})}^{2}} . \end{array}

(28)

Maximization Step 3: $α_{j}$ - and $β_{j}$ -Parameters

Let $α_{j}^{*} = \ln (α_{j})$ , and $λ_{α_{j}^{*}}$ and $λ_{β_{j}}$ be the summation of the expected first derivatives of the log-likelihood and the first derivatives of the prior distribution with respect to $α_{j}^{*}$ and $β_{j}$ , respectively. In this maximization step, we first solve the estimates of $α_{j}^{*}$ , and then obtain the $α_{j}$ parameter via $α_{j} = \exp (α_{j}^{*})$ . For both the $α_{j}^{*}$ - and $β_{j}$ -parameters, we use a normal prior with means $μ_{α_{j}^{*}}$ and $μ_{β_{j}}$ and variances $σ_{α_{j}^{*}}^{2}$ and $σ_{β_{j}}^{2}$ , respectively. Then,

\begin{array}{l} λ_{α_{j}^{*}} = \frac{\partial \ln E (L (Y, W | Ψ))}{\partial α_{j}^{*}} + \frac{\partial \ln g (α_{j}^{*} | μ_{α_{j}^{*}}, σ_{α_{j}^{*}}^{2})}{\partial α_{j}^{*}} \\ = D \exp (α_{j}^{*}) \sum_{k = 1}^{q} [X_{k} ({\bar{f}}_{j k}^{(W)} - E ({\bar{f}}_{j k}^{(W)}))] - \frac{α_{j}^{*} - μ_{α_{j}^{*}}}{σ_{α_{j}^{*}}^{2}} \\ = D \exp (α_{j}^{*}) \sum_{k = 1}^{q} [X_{k} ({\bar{f}}_{j k}^{(W)} - P_{j}^{*} (X_{k}) {\bar{f}}_{k})] - \frac{α_{j}^{*} - μ_{α_{j}^{*}}}{σ_{α_{j}^{*}}^{2}} \end{array}

(29)

and

\begin{array}{l} λ_{β_{j}} = \frac{\partial \ln E (L (Y, W | Ψ))}{\partial β_{j}} + \frac{\partial \ln g (β_{j} | μ_{β_{j}}, σ_{β_{j}}^{2})}{\partial β_{j}} \\ = - D \sum_{k = 1}^{q} [{\bar{f}}_{j k}^{(W)} - E ({\bar{f}}_{j k}^{(W)})] - \frac{β_{j} - μ_{β_{j}}}{σ_{β_{j}}^{2}} \\ = - D \sum_{k = 1}^{q} [{\bar{f}}_{j k}^{(W)} - P_{j}^{*} (X_{k}) {\bar{f}}_{k}] - \frac{β_{j} - μ_{β_{j}}}{σ_{β_{j}}^{2}} . \end{array}

(30)

Unlike the guessing and slipping parameters, $α_{j}^{*}$ and $β_{j}$ do not have closed-form solutions for their estimates, so it is necessary to find them numerically. The method used here is the Fisher scoring algorithm, which uses the expected values of the second derivatives, along with the first derivatives. These will also be used later for standard error calculations. We find the second derivatives as follows:

\begin{array}{l} λ_{α_{j}^{*} α_{j}^{*}} = - D^{2} \exp (2 α_{j}^{*}) \\ \sum_{k = 1}^{q} {X_{k}^{2} P_{j}^{*} (X_{k}) [1 - P_{j}^{*} (X_{k})] {\bar{f}}_{k}} - \frac{1}{σ_{α_{j}^{*}}^{2}}, \end{array}

(31)

λ_{β_{j} β_{j}} = - D^{2} \sum_{k = 1}^{q} {P_{j}^{*} (X_{k}) [1 - P_{j}^{*} (X_{k})] {\bar{f}}_{k}} - \frac{1}{σ_{β_{j}}^{2}},

(32)

and

\begin{array}{l} λ_{α_{j}^{*} β_{j}} = D^{2} \exp (α_{j}^{*}) \\ \sum_{k = 1}^{q} {X_{k} P_{j}^{*} (X_{k}) [1 - P_{j}^{*} (X_{k})] {\bar{f}}_{k}} . \end{array}

(33)

Standard Errors of Parameter Estimation in BE3M

An important index of estimation quality is the standard error (SE). However, one major criticism of the EM algorithm is that parameter estimate SEs are not a natural product of the algorithm, and so other methods have to be devised (McLachlan & Krishnan, 2007). BE3M falls prey to this criticism, just as all members of the EM algorithm family do.

It can be shown that the inverse of the negative expected value of the matrix of second derivatives of the log-likelihood (i.e., the inverse of the information matrix) is the asymptotic covariance matrix of the estimates (Stuart & Kendall, 1968). As such, the square roots of the diagonal elements of the resulting matrix are the asymptotic standard errors of the parameters. Generically, the expected second derivative matrix can be written as

Λ_{j} = (\begin{matrix} λ_{α_{j}^{*} α_{j}^{*}} & λ_{α_{j}^{*} β_{j}} & λ_{α_{j}^{*} γ_{j}} & λ_{α_{j}^{*} ς_{j}} \\ λ_{α_{j}^{*} β_{j}} & λ_{β_{j} β_{j}} & λ_{β_{j} γ_{j}} & λ_{β_{j} ς_{j}} \\ λ_{α_{j}^{*} γ_{j}} & λ_{β_{j} γ_{j}} & λ_{γ_{j} γ_{j}} & λ_{γ_{j} ς_{j}} \\ λ_{α_{j}^{*} ς_{j}} & λ_{β_{j} ς_{j}} & λ_{γ_{j} ς_{j}} & λ_{ς_{j} ς_{j}} \end{matrix})

(34)

in which each entry is the expected partial second derivative of the log-likelihood with respect to the subscript parameters for a given item j. Convergence issues resulting from the ill-conditioning of the matrix plague the MMLE/EM for 4PLM just as it does for the 3PLM (Baker & Kim, 2004).

In contrast, due to the divide-and-conquer strategy implemented in BE3M, we contend that the resulting expected second derivative matrix does not have these issues. In BE3M, the item parameter estimation problem is separated into three smaller distinct estimation problems: (1) estimation of the guessing parameter, (2) estimation of the slipping parameter, and (3) joint estimation of the difficulty and discrimination parameters. This separation has the benefit that any instabilities present in one of the three steps will not negatively affect either of the other two steps. Statistically, this separation implies that the covariances between ${\hat{γ}}_{j}$ and ${\hat{α}}_{j}^{*}$ , ${\hat{β}}_{j}$ , and ${\hat{ς}}_{j}$ are zero; similarly, the covariances between ${\hat{ς}}_{j}$ and ${\hat{α}}_{j}^{*}$ , ${\hat{β}}_{j}$ , and ${\hat{γ}}_{j}$ are zero. Collected together, we find the item j expected second derivative matrix to be

Λ_{j} = (\begin{matrix} λ_{α_{j}^{*} α_{j}^{*}} & λ_{α_{j}^{*} β_{j}} & 0 & 0 \\ λ_{α_{j}^{*} β_{j}} & λ_{β_{j} β_{j}} & 0 & 0 \\ 0 & 0 & λ_{γ_{j} γ_{j}} & 0 \\ 0 & 0 & 0 & λ_{ς_{j} ς_{j}} \end{matrix}) .

(35)

The standard errors of the estimates for item j, then, are the square roots of the diagonal elements of the matrix $Σ_{j} = - Λ_{j}^{- 1}$ . Furthermore, the standard error of the $α_{j}$ parameter can be transformed from $Λ_{j}$ via the delta method (Oehlert, 1992).

Simulation Studies

This section presents Monte Carlo results regarding the accuracy of the developed BE3M procedure for estimating 4PLM item parameters, compared to the BME in mirt (version 1.30) package (Chalmers, 2012) and the MCMC in R package fourPNO (Culpepper, 2016) and “bayesmh” module of Stata/SE 16. Results from two simulation studies are reported. The goal of the simulation studies was to assess the influence of various design features, including the location of item threshold, sample size, and priors in educational (Simulation 1) and psychological (Simulation 2) scenarios.

Item and Person Parameter Generation

For Simulation 1, the item parameters were generated in the same way as in Culpepper (2016). Specifically, item parameters for 20 items were generated from $α_{j} ~ N (2, 0.5) I (α > 0)$ , $β_{j} ~ N (0, 0.5)$ , $γ_{j} ~ B e t a (2, 8)$ , and $ς_{j} | γ_{j} ~ B e t a (0, 1) I (ς_{j} < 1 - γ_{j})$ . For Simulation 2, we used the item parameters for the 23 items calibrated in Waller and Reise (2010) application to an adolescent self-esteem scale; these were also used in Culpepper (2016). For both testing scenarios, three sample sizes (2,500, 5,000, and 10,000) of person parameters were sampled from the standard normal distribution.

The Setting of MCMC in R Package FourPNO

Since Culpepper (2016) introduced an extra latent variable and imposed various truncation restrictions on the item parameters, the priors in the fourPNO package have obvious differences from the traditional MCMC procedures (see Culpepper (2016) for details). Thus, the current studies used the same setting in the fourPNO package as the simulation studies of Culpepper (2016) in each condition: (1) the $α_{j}$ parameters were sampled from the truncated normal distribution $N (0, 2) I (α > 0)$ , which implied $E (α_{j}) = \frac{2}{\sqrt{π}}$ in the priors; (2) the $β_{j}$ parameters were sampled from $N (0, 2)$ ; (3) the uniform truncated $B eta (υ, ρ)$ priors were employed for both $γ$ and $ς$ parameters $(υ_{j}^{γ} = ρ_{j}^{γ} = υ_{j}^{ς} = ρ_{j}^{ς} = 1)$ ; and (4) the $θ_{i}$ parameters were sampled from $N (0, 1)$ . Furthermore, since Culpepper (2016) concluded that “no more than 50,000 iterations were needed to achieve $\hat{R}$ < 1.1 for all item parameters,” only one chain (100,000 MCMC iterations and discarded the first 50,000 as burn-in) were run in the current studies.

The Setting of BE3M, BME in R Package mirt and MCMC in Stata

Both the BE3M and mirt used 61 quadrature points to approximate the Gaussian distribution from −6 to +6 in all simulation studies and real-data examples. The priors for α and β parameters were set as follows: $\ln α ~ N (0, 0.25)$ and $β ~ N (0, 2)$ . Furthermore, two different priors for γ and ζ were used in the BE3M, mirt, and Stata:

$γ, ς ~ B e t a (2, 8)$ , which is the default setting for the guessing parameter in the advanced IRT software flexmirt (Houts & Cai, 2015), and

$γ, ς ~ B e t a (4, 16)$ , which is the default setting for the guessing parameter in the classical IRT software BILOG-MG3 (Zimowski et al., 2003)

In sum, for both the educational and psychological testing scenarios, 21 conditions were simulated: Seven estimation method conditions (fourPNO, BE3M with prior conditions 1 and 2, mirt with prior conditions 1 and 2, and Stata with prior conditions 1 and 2) each crossed with three examinee sample size conditions (2,500, 5,000, and 10,000). For each condition, 100 replications were run to reduce sampling error. Each condition was assessed by its accuracy in recovering individual item parameter as defined by bias and the root mean squared error (RMSE) across the 100 replications:

b i a s_{j} = \frac{\sum_{s = 1}^{S = 100} ({\hat{ψ}}_{j}^{(s)} - ψ_{j})}{S},

(36)

R M S E_{j} = \sqrt{\frac{\sum_{s = 1}^{S = 100} {({\hat{ψ}}_{j}^{(s)} - ψ_{j})}^{2}}{S}},

(37)

Furthermore, since Waller and Feuerstahler (2017) pointed out that the item parameter recovery for the individual parameter cannot reflect the entire situation of the accuracy of fitting the 4PLM, the current studies also introduced the root integrated mean squared error (RIMSE; Ramsay, 1991) to assess the entire accuracy via the item response function of the 4PLM:

R I M S E_{j} = \frac{\sum_{s = 1}^{S = 100} \sqrt{\int {[{\hat{P}}_{j}^{(s)} (θ_{i}) - P_{j} (θ_{i})]}^{2} g (θ_{i}) d θ_{i}}}{S} .

(38)

Results

The results of two simulation studies will be respectively reported at two levels: the individual level for each type of parameters and the whole level for the item response function of the 4PLM.

As for the individual level, the detailed results for two simulation studies are presented in tables in Supplemental Appendices B and C, respectively. Only RMSEs (Tables 1 and 2) for the condition of 5,000 examinees are summarized and presented here since those for other conditions are very similar. The results of two simulation studies can be briefly summarized as follows: (1) In general, the results of MCMC in the fourPNO package replicate the previous study in both scenarios; (2) The item parameter recovery of BE3M are very similar with MCMC although the RMSEs of the α parameters yielded by the fourPNO package and Stata are slightly larger than those from the BE3M in the first simulation study; (3) The mirt package and the BE3M have similar bias and RMSEs for α and β parameters, but the BE3M has better performance in estimating the γ and ζ parameters; and (4) Compared with the mirt package, the BE3M, and Stata can provide more stable estimates when changing priors for γ and ζ parameters.

Table 1.

The RMSEs of Item Parameter Recovery for 5,000 Examinees in the Simulation Study 1.

Item	True values		RMSEs for α							RMSEs for β
Item	α	β	Four PNO	BE3M (2, 8)	BE3M (4, 16)	mirt (2, 8)	mirt (4, 16)	Stata (2, 8)	Stata (4, 16)	Four PNO	BE3M (2, 8)	BE3M (4, 16)	mirt (2, 8)	mirt (4, 16)	Stata (2, 8)	Stata (4, 16)
1	1.24	.62	.36	.29	.26	.35	.31	.32	.25	.21	.16	.16	.17	.17	.20	.18
2	.84	.75	.33	.14	.12	.13	.10	.17	.18	.20	.14	.13	.14	.13	.11	.10
3	1.69	.19	.39	.32	.29	.35	.31	.27	.19	.15	.12	.12	.12	.12	.10	.09
4	.59	.04	.54	.18	.11	.14	.10	.14	.15	.28	.24	.24	.24	.23	.21	.19
5	.88	.29	.25	.17	.19	.15	.16	.19	.23	.16	.15	.15	.15	.15	.14	.13
6	.80	−.29	.40	.20	.17	.20	.15	.14	.17	.17	.15	.14	.14	.13	.16	.15
7	.51	.10	.42	.24	.19	.15	.10	.25	.22	.22	.23	.23	.23	.23	.21	.24
8	2.82	1.37	.39	.45	.37	.45	.37	.48	.41	.20	.26	.23	.25	.23	.31	.30
9	.50	.03	.59	.20	.08	.15	.13	.21	.13	.35	.31	.26	.26	.18	.30	.27
10	.94	1.27	.41	.15	.14	.17	.15	.25	.24	.29	.21	.21	.23	.23	.27	.26
11	1.81	.90	.27	.24	.25	.25	.26	.28	.32	.15	.11	.11	.11	.11	.14	.16
12	1.18	.97	.37	.20	.26	.19	.25	.24	.28	.24	.16	.16	.16	.16	.21	.20
13	.53	.72	.42	.19	.15	.13	.09	.30	.22	.20	.18	.17	.15	.14	.16	.13
14	1.76	−.77	.33	.23	.30	.26	.32	.44	.52	.11	.11	.11	.11	.11	.10	.10
15	1.13	.24	.28	.23	.23	.24	.22	.32	.35	.13	.10	.10	.10	.10	.09	.12
16	.59	.53	.34	.23	.26	.17	.22	.32	.33	.17	.12	.11	.12	.11	.09	.09
17	.67	−.17	.31	.17	.17	.12	.12	.24	.25	.15	.15	.14	.15	.14	.10	.11
18	.75	1.23	.32	.11	.13	.10	.10	.16	.17	.20	.14	.13	.13	.12	.15	.14
19	.76	.79	.33	.15	.15	.13	.12	.20	.19	.25	.17	.16	.17	.16	.13	.14
20	1.30	1.10	.35	.25	.21	.28	.24	.24	.21	.25	.16	.15	.17	.16	.17	.15
1	.25	.23	.03	.03	.03	.04	.04	.04	.03	.06	.07	.06	.08	.06	.06	.05
2	.13	.17	.03	.02	.02	.03	.02	.02	.01	.06	.06	.04	.06	.04	.05	.05
3	.25	.14	.04	.03	.03	.04	.03	.02	.02	.03	.03	.02	.03	.02	.02	.02
4	.29	.16	.07	.07	.06	.12	.10	.05	.05	.06	.05	.04	.05	.03	.04	.03
5	.21	.07	.05	.04	.04	.05	.04	.04	.03	.04	.04	.05	.04	.04	.04	.05
6	.19	.20	.06	.06	.04	.07	.04	.05	.03	.04	.03	.03	.04	.03	.03	.03
7	.26	.09	.06	.07	.06	.09	.07	.05	.04	.08	.08	.08	.05	.06	.08	.09
8	.23	.04	.01	.02	.01	.02	.01	.02	.02	.02	.01	.01	.01	.01	.01	.02
9	.25	.30	.06	.06	.04	.10	.08	.04	.04	.06	.06	.06	.12	.10	.06	.05
10	.19	.14	.02	.02	.02	.03	.02	.02	.02	.09	.05	.04	.04	.02	.07	.08
11	.07	.13	.01	.01	.01	.01	.01	.01	.01	.04	.03	.03	.03	.03	.02	.02
12	.22	.03	.02	.02	.02	.02	.02	.02	.02	.06	.05	.07	.04	.06	.05	.07
13	.07	.18	.05	.03	.03	.02	.03	.03	.03	.15	.08	.06	.08	.04	.13	.09
14	.07	.09	.04	.03	.04	.03	.04	.04	.05	.01	.01	.01	.01	.01	.01	.01
15	.08	.23	.03	.02	.02	.03	.02	.03	.03	.05	.04	.04	.05	.04	.04	.03
16	.08	.04	.06	.05	.05	.04	.05	.05	.06	.14	.13	.14	.12	.14	.16	.16
17	.20	.08	.06	.07	.05	.07	.04	.06	.06	.04	.04	.05	.03	.04	.03	.04
18	.06	.12	.02	.01	.01	.01	.01	.01	.02	.15	.08	.08	.06	.06	.12	.11
19	.18	.15	.03	.03	.02	.03	.03	.02	.02	.08	.06	.05	.05	.03	.06	.05
20	.16	.23	.02	.02	.02	.02	.02	.02	.02	.07	.07	.05	.08	.06	.06	.05

Note. Bold indicates the relatively larger RMSEs in the same conditions.

Table 2.

The RMSEs of Item Parameter Recovery for 5,000 Examinees in the Simulation Study 2.

Item	True values		RMSEs for α							RMSEs for β
Item	α	Β	Four PNO	BE3M (2, 8)	BE3M (4, 16)	mirt (2, 8)	mirt (4, 16)	Stata (2, 8)	Stata (4, 16)	Four PNO	BE3M (2, 8)	BE3M (4, 16)	mirt (2, 8)	mirt (4, 16)	Stata (2, 8)	Stata (4, 16)
1	1.91	−.53	.34	.29	.26	.30	.28	.30	.30	.15	.14	.16	.15	.16	.12	.11
2	1.95	−.31	.34	.28	.29	.30	.33	.30	.33	.13	.13	.15	.13	.15	.12	.13
3	1.50	.07	.22	.23	.27	.23	.27	.25	.32	.10	.10	.11	.10	.11	.10	.11
4	1.12	.07	.16	.20	.24	.20	.23	.14	.18	.11	.09	.11	.09	.11	.11	.14
5	.89	.40	.14	.16	.20	.15	.18	.22	.26	.10	.09	.09	.09	.09	.09	.09
6	1.08	−.54	.23	.20	.23	.20	.22	.34	.36	.11	.11	.11	.11	.11	.11	.12
7	1.16	−.55	.24	.19	.20	.19	.20	.15	.20	.12	.11	.12	.12	.12	.12	.12
8	1.10	.01	.18	.21	.24	.20	.23	.27	.26	.11	.09	.10	.09	.10	.11	.11
9	.78	.35	.24	.16	.12	.17	.14	.14	.12	.20	.16	.20	.18	.24	.17	.20
10	1.23	.23	.14	.18	.26	.18	.26	.21	.29	.06	.05	.06	.05	.06	.06	.07
11	1.34	.55	.12	.17	.21	.16	.21	.24	.29	.07	.07	.07	.07	.07	.06	.07
12	1.54	−.74	.30	.24	.24	.26	.26	.36	.36	.17	.17	.18	.18	.19	.20	.20
13	1.16	.21	.31	.26	.25	.28	.29	.21	.18	.18	.18	.24	.19	.27	.17	.25
14	.84	.60	.16	.15	.16	.14	.14	.15	.19	.12	.09	.10	.09	.10	.08	.10
15	1.13	.17	.24	.18	.19	.18	.19	.23	.27	.11	.11	.13	.11	.14	.13	.14
16	.79	.94	.26	.13	.11	.12	.08	.11	.14	.12	.12	.13	.12	.13	.14	.14
17	1.27	.61	.16	.16	.23	.16	.22	.24	.31	.07	.06	.07	.06	.07	.07	.09
18	.94	1.29	.26	.14	.17	.13	.15	.17	.21	.11	.13	.13	.13	.13	.16	.16
19	.84	1.21	.23	.10	.13	.08	.10	.14	.16	.11	.11	.13	.11	.12	.14	.16
20	1.14	1.73	.25	.12	.17	.12	.15	.13	.19	.09	.11	.15	.10	.14	.13	.16
21	1.10	.28	.12	.17	.25	.16	.24	.20	.26	.06	.07	.07	.06	.07	.07	.08
22	.72	.38	.22	.17	.19	.15	.16	.14	.19	.15	.14	.14	.14	.15	.17	.16
23	.88	1.37	.30	.13	.15	.10	.12	.12	.17	.14	.13	.13	.13	.12	.11	.11
1	.04	.48	.01	.01	.01	.01	.01	.01	.01	.02	.02	.02	.02	.02	.01	.01
2	.02	.52	.01	.01	.01	.01	.01	.01	.01	.02	.02	.02	.02	.02	.02	.02
3	.02	.40	.01	.01	.01	.01	.01	.01	.02	.03	.03	.03	.03	.03	.02	.02
4	.02	.37	.01	.02	.03	.02	.03	.02	.03	.04	.03	.03	.04	.03	.03	.03
5	.04	.18	.02	.02	.03	.02	.02	.02	.02	.05	.06	.06	.05	.05	.06	.06
6	.06	.17	.05	.04	.05	.04	.05	.05	.05	.02	.02	.02	.02	.02	.03	.03
7	.07	.29	.03	.03	.03	.03	.03	.01	.03	.02	.02	.02	.02	.02	.03	.02
8	.04	.27	.02	.02	.03	.02	.03	.02	.03	.04	.04	.03	.04	.03	.04	.03
9	.05	.43	.02	.02	.02	.02	.01	.02	.02	.09	.08	.08	.12	.13	.08	.08
10	.01	.10	.01	.02	.03	.02	.03	.02	.03	.03	.03	.04	.03	.04	.03	.04
11	.02	.15	.01	.01	.01	.01	.01	.01	.02	.03	.04	.04	.04	.04	.03	.04
12	.06	.41	.02	.02	.02	.02	.02	.02	.03	.02	.02	.02	.03	.02	.02	.02
13	.02	.60	.01	.01	.01	.01	.01	.01	.02	.05	.06	.07	.07	.10	.04	.07
14	.04	.25	.02	.02	.02	.01	.02	.02	.02	.07	.06	.05	.07	.05	.04	.04
15	.03	.39	.02	.02	.02	.02	.02	.02	.02	.04	.03	.03	.04	.03	.03	.03
16	.04	.27	.01	.01	.01	.01	.01	.01	.01	.09	.09	.06	.10	.07	.07	.05
17	.01	.16	.01	.01	.02	.01	.02	.02	.02	.04	.04	.04	.04	.04	.04	.04
18	.09	.06	.01	.01	.01	.01	.01	.01	.01	.13	.08	.11	.07	.10	.10	.12
19	.02	.18	.01	.01	.01	.01	.01	.01	.01	.11	.08	.07	.07	.04	.06	.05
20	.00	.18	.00	.00	.01	.00	.01	.01	.01	.12	.08	.06	.08	.06	.05	.06
21	.02	.07	.02	.02	.03	.02	.03	.03	.04	.03	.04	.04	.03	.04	.04	.05
22	.24	.05	.04	.04	.04	.04	.04	.04	.04	.06	.05	.07	.04	.06	.06	.08
23	.06	.09	.01	.01	.01	.01	.01	.01	.01	.15	.10	.11	.10	.11	.10	.12

Note. Bold indicates the relatively larger RMSEs in the same conditions.

As for the whole level, the RIMSEs for two simulation studies are presented in Figure 1. In general, with the increasing of the sample size, the RIMSEs of BE3M, fourPNO, Stata, and mirt package gradually decrease, and all of the RIMSEs across different conditions are relatively small. Furthermore, the results of RIMSEs show that there is no obvious difference between BE3M and BME in R package mirt although the BE3M has better performance in estimating the asymptotes, which is consistent with Waller and Feuerstahler (2017).

Figure 1.

The RMSEs of item parameter recovery for all simulation conditions.

Real Data Example

The 4PLM is applied to bullying items collected as part of the 2005 to 2006 Health Behavior in School-Aged Children study (Iannotti, 2005) funded by the National Institute of Child Health and Human Development. This real data example was chosen to replicate previous results (Culpepper, 2016), and to demonstrate comparability between BE3M and the MCMC approaches (fourPNO package and “bayesmh” module in Stata) in terms of estimation accuracy and BE3M’s advantage in terms of estimation time. The prior settings for the “bayesmh” module in Stata are the same as those for BE3M and BME in the simulation studies. To further compare BE3M with BME, items were also calibrated with BME in the mirt package. In addition, as we have discussed in the introduction, there is no available SEs for all item parameters in the mirt package because the Beta priors (“expbeta” in mirt) were imposed on the guessing and slipping parameters in this example.

The data includes student responses on ten items related to bullying behavior for 7,491 adolescents. The ten bullying items asked students “How often have you bullied another student(s) at school in the past couple of months in the ways listed below?” The items were dichotomized as “1→0” and “>1→1” to indicate the student has not bullied or has bullied, respectively, in the given way over the past couple of months. For this application, the latent variable $W_{i j}$ can be interpreted as

W_{i j} = {\begin{matrix} \begin{array}{l} 1, Student i bullied another \\ Student as asked in item j, \end{array} \\ \begin{array}{l} 0, Student i did not bully \\ another student as asked in item j, \end{array} \end{matrix}

so that $ς_{j}$ is the probability that a student who bullied over the past couple of months neglected to report bullying. On the other hand, it can be expected for $γ_{j}$ to be zero in this application, because it is unlikely for students who did not bully to report bullying behaviors (Culpepper, 2016).

Results

Item calibration results from the three methods are summarized in Table 3. The estimates and SEs yielded by fourPNO are the same as in Culpepper’s (2016) study due to we used identical package (fourPNO) and estimation conditions. The BE3M produces point estimates comparable to both MCMC approaches, and estimates of the BE3M and Stata remain stable when changing priors. Unlike the performance of the mirt package in the simulation studies, it is interesting to note that the changing of priors has no obvious impact on the guessing and slipping parameters, but rather on the sloping parameters for the Item 1 and 2, which perhaps caused by the unstable Hessian matrix of the BME method. The implication is that careful consideration of priors should be taken if one uses the mirt package (which implements the BME method) to pursue precise estimates of the 4PLM.

Table 3.

The Estimated Item Parameters and the SEs for HBSC Data.

Item	α (SE)							β (SE)
	Four PNO	Stata		BE3M		mirt		Four PNO	Stata		BE3M		Mirt
	Four PNO	β (2, 8)	β (4, 16)	β (2, 8)	β (4, 16)	β (2, 8)	β (4, 16)	Four PNO	β (2, 8)	β (4, 16)	β (2, 8)	β (4, 16)	β (2, 8)	β (4, 16)
1	4.44 (.44)	6.16 (.72)	6.31 (.79)	5.20 (.61)	5.40 (.51)	5.75	6.32	.67 (.10)	.92 (.16)	.90 (.18)	.82 (.03)	.92 (.03)	.84	1.01
2	3.54 (.33)	3.59 (.52)	3.67 (.63)	3.54 (.10)	3.70 (.11)	3.62	4.14	.74 (.09)	.83 (.12)	.97 (.15)	.81 (.03)	.94 (.03)	.79	.94
3	1.21 (.07)	1.31 (.09)	1.39 (.09)	1.25 (.03)	1.36 (.04)	1.25	1.35	1.10 (.06)	1.15 (.08)	1.18 (.07)	1.14 (.02)	1.20 (.02)	1.12	1.18
4	1.41 (.06)	1.49 (.06)	1.55 (.09)	1.50 (.04)	1.56 (.05)	1.49	1.55	1.83 (.06)	1.91 (.05)	1.94 (.08)	1.94 (.03)	1.98 (.03)	1.91	1.96
5	1.63 (.09)	1.66 (.11)	1.74 (.10)	1.71 (.06)	1.79 (.06)	1.68	1.75	2.32 (.11)	2.35 (.13)	2.42 (.11)	2.43 (.03)	2.51 (.03)	2.38	2.46
6	1.93 (.10)	2.02 (.12)	2.06 (.11)	2.09 (.07)	2.16 (.08)	2.04	2.12	2.93 (.13)	3.04 (.15)	3.08 (.14)	3.16 (.04)	3.24 (.04)	3.09	3.17
7	2.44 (.15)	2.50 (.15)	2.58 (.14)	2.78 (.12)	2.91 (.13)	2.66	2.76	4.05 (.22)	4.15 (.21)	4.25 (.20)	4.59 (.05)	4.74 (.05)	4.39	4.52
8	1.58 (.08)	1.63 (.10)	1.68 (.09)	1.65 (.05)	1.72 (.05)	1.63	1.70	2.13 (.09)	2.16 (.11)	2.19 (.09)	2.21 (.03)	2.27 (.03)	2.18	2.24
9	2.23 (.15)	2.26 (.11)	2.36 (.16)	2.49 (.10)	2.59 (.11)	2.36	2.45	3.75 (.22)	3.79 (.17)	3.92 (.23)	4.16 (.05)	4.27 (.05)	3.95	4.07
10	2.29 (.16)	2.41 (.15)	2.44 (.19)	2.58 (.11)	2.70 (.12)	2.44	2.54	3.89 (.24)	4.07 (.22)	4.11 (.27)	4.36 (.05)	4.51 (.05)	4.13	4.27
1	.00 (.00)	.01 (.00)	.01 (.01)	.01 (.00)	.01 (.00)	.00	.01	.17 (.01)	.18 (.01)	.18 (.01)	.18 (.01)	.18 (.01)	.18	.18
2	.00 (.00)	.01 (.01)	.02 (.01)	.01 (.00)	.02 (.00)	.01	.02	.18 (.01)	.17 (.01)	.18 (.01)	.17 (.01)	.17 (.01)	.17	.17
3	.01 (.01)	.01 (.01)	.02 (.01)	.01 (.00)	.02 (.00)	.01	.02	.03 (.02)	.05 (.02)	.07 (.02)	.03 (.00)	.06 (.01)	.03	.06
4	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.00	.00	.01 (.01)	.02 (.02)	.04 (.02)	.01 (.00)	.03 (.01)	.01	.03
5	.01 (.00)	.01 (.00)	.01 (.00)	.01 (.00)	.01 (.00)	.01	.01	.01 (.01)	.02 (.01)	.04 (.02)	.01 (.00)	.03 (.01)	.01	.03
6	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.01 (.00)	.00	.00	.01 (.01)	.02 (.01)	.04 (.02)	.01 (.00)	.03 (.01)	.01	.03
7	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.00	.00	.01 (.01)	.02 (.01)	.04 (.02)	.01 (.01)	.03 (.01)	.01	.03
8	.01 (.00)	.01 (.00)	.01 (.00)	.01 (.00)	.01 (.00)	.01	.01	.01 (.01)	.02 (.02)	.04 (.02)	.01 (.00)	.03 (.01)	.01	.03
9	.01 (.00)	.00 (.00)	.01 (.00)	.01 (.00)	.01 (.00)	.01	.01	.01 (.01)	.02 (.01)	.04 (.02)	.01 (.01)	.03 (.01)	.01	.03
10	.00 (.00)	.00 (.00)	.00 (.00)	.00 (.00)	.01 (.00)	.00	.00	.01 (.01)	.02 (.01)	.04 (.02)	.01 (.01)	.03 (.01)	.01	.03

Note. Bold indicates an obvious difference in estimates when changing priors.

In terms of SEs, the two priors do not seem to influence the quality of estimation with similar values for the SEs in general. We also find that the SEs for the MCMC approaches are similar to those obtained with BE3M, with the exceptions of the sloping parameters.

The main finding is that BE3M is much more computationally efficient than the MCMC. The Gibbs sampler in the fourPNO package required approximately 47 minutes to complete 100,000 iterations and the MH sampler in Stata required approximately 3 hours to complete 10,000 iterations with N = 7,491 using a 2.4 GHz processor and 6 GB of RAM while BE3M only needed about 1 minute. We hope that this finding in particular will help to promote further applications of the 4PLM in practice.

Discussion

In recent years, interest in the four-parameter IRT model has been on the rise in measurement research. The current study presented a new Bayesian formulation of the 4PLM model. Through two simulation studies and a real data analysis, we demonstrated that the BE3M combines the strengths of these two approaches; the BE3M can produce estimates as accurately as the Gibbs sampling method as quickly as the Bayes model estimation approach using mirt. To facilitate calibration, the authors have made available an R package IRTBEMM (will be published on CRAN soon). The authors believe this package offers a better alternative for 4PLM calibration, helping to promote the use of the 4PLM in educational and psychological research.

Several opportunities exist for future research related to the 4PLM. First, future research can help to identify other substantive research areas and large survey datasets that could benefit from modeling the slipping effect (Beghetto, 2019). Second, for successful, high-efficient integration into a CAT testing system, being able to implement online calibration of item parameters is important (Zheng, 2016). Since the method proposed at present is a member of the EM family to which most online calibration algorithms belong, there is little barrier to developing online calibration methods based on the BE3M.

Supplemental Material

sj-docx-1-sgo-10.1177_21582440211052556 – Supplemental material for Fast Bayesian Estimation for the Four-Parameter Logistic Model (4PLM)

Supplemental material, sj-docx-1-sgo-10.1177_21582440211052556 for Fast Bayesian Estimation for the Four-Parameter Logistic Model (4PLM) by Chanjin Zheng, Shaoyang Guo and Justin L Kern in SAGE Open

Footnotes

Author’s Note

Chanjin Zheng and Shaoyang Guo contributed to the work equally and should be regarded as co-first authors. Chanjin Zheng is affiliated with the Department of Educational Psychology, Faculty of Education, East China Normal University, China. Shaoyang Guo is affiliated with the Institute of Curriculum & Instruction, Faculty of Education, East China Normal University, China.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

This article does not contain any studies with human participants performed by any of the authors.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the Flower of Happiness Project in social science of East China Normal University (2020ECNU-XFZH007, 2019ECNU-XFZH015) and the Peak Discipline Construction Project of Education at East China Normal University.

ORCID iDs

Chanjin Zheng

Shaoyang Guo

Supplemental Material

Supplemental material for this article is available online.

References

Baker

F. B.

Kim

S.-H.

(2004). Item response theory: Parameter estimation techniques (2nd ed.). CRC Press.

Barton

M. A.

Lord

F. M.

(1981). An upper asymptote for the three-parameter logistic item response model. ETS Research Report Series, 1981(1), 1–8. https://doi.org/10.1002/j.2333-8504.1981.tb01255.x

Beghetto

R. A.

(2019). Large-scale assessments, personalized learning, and creativity: Paradoxes and possibilities. ECNU Review of Education, 2(3), 311–327. https://doi.org/10.1177/2096531119878963

Béguin

A. A.

Glas

C. A. W.

(2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561. https://doi.org/10.1007/bf02296195

Bock

R. D.

Aitkin

(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801

Chalmers

R. P.

(2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Culpepper

S. A.

(2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163. https://doi.org/10.1007/s11336-015-9477-6

Culpepper

S. A.

(2017). The prevalence and implications of slipping on low-stakes, large-scale assessments. Journal of Educational and Behavioral Statistics, 42(6), 706–725. https://doi.org/10.3102/1076998617705653

Feuerstahler

L. M.

Waller

N. G.

(2014). Estimation of the 4-parameter model with marginal maximum likelihood. Multivariate Behavioral Research, 49(3), 285–285. https://doi.org/10.1080/00273171.2014.912889

10.

Guo

Zheng

(2019). The Bayesian expectation-maximization-maximization for the 3PLM. Frontiers in Psychology, 10, 1175. https://doi.org/10.3389/fpsyg.2019.01175

11.

Houts

C. R.

Cai

(2015). flexmirt: Flexible multilevel multidimensional item analysis and test scoring user’s manual version 3.5 RC. North Carolina, CA. https://vpgcentral.com/software/flexmirt/

12.

Iannotti

(2005). Health behavior in school-aged children HBSC, 2005–2006. MI Inter-University Consortium for Political and Social Research.

13.

Liao

W.-W.

R.-G.

Yen

Y.-C.

Cheng

H.-C.

(2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679–1694. https://doi.org/10.2224/sbp.2012.40.10.1679

14.

Loken

Rulison

K. L.

(2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509–525. https://doi.org/10.1348/000711009X474502

15.

McDonald

R. P.

(1967). Nonlinear factor analysis (Psychometric Monograph No. 15). Richmond, VA: Psychometric Corporation. Retrieved from http://www.psychometrika.org/journal/online/MN15.pdf

16.

McLachlan

Krishnan

(2007). The EM algorithm and extensions (Vol. 382). John Wiley & Sons.

17.

Oehlert

G. W.

(1992). A note on the delta method. The American Statistician, 46(1), 27–29.

18.

Ramsay

J. O.

(1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611–630. https://doi.org/10.1007/BF02294494

19.

Reise

S. P.

Waller

N. G.

(2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods, 8(2), 164–184. https://doi.org/10.1037/1082-989X.8.2.164

20.

Rouse

S. V.

Finger

M. S.

Butcher

J. N.

(1999). Advances in clinical personality measurement: An item response theory analysis of the MMPI-2 PSY-5 scales. Journal of Personality Assessment, 72, 282–307. https://doi.org/10.1207/S15327752JP720212

21.

Rulison

K. L.

Loken

(2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83–101. https://doi.org/10.1177/0146621608324023

22.

Stuart

Kendall

M. G.

(1968). The advanced theory of statistics. Macmillan.

23.

Waller

N. G.

Feuerstahler

(2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350–370. https://doi.org/10.1080/00273171.2017.1292893

24.

Waller

N. G.

Reise

S. P.

(2010). Measuring psychopathology with nonstandard item response theory models: Fitting the four-parameter model to the Minnesota Multiphasic Personality Inventory. In Embretson

(Ed.), Measuring psychological constructs: Advances in model based approaches (pp. 147–173). American Psychological Association.

25.

Zheng

Meng

Guo

Liu

(2017). Expectation-maximization-maximization: A feasible MLE algorithm for the three-parameter logistic model based on a mixture modeling reformulation. Frontiers in Psychology, 8, 2302. https://doi.org/10.3389/fpsyg.2017.02302

26.

Zheng

(2016). Online calibration of polytomous items under the generalized partial credit model. Applied Psychological Measurement, 40(6), 434–450. https://doi.org/10.1177/0146621616650406

27.

Zimowski

M. F.

Muraki

Mislevy

R. J.

Bock

R. D.

(2003). BILOG-MG3 user’s guide. Scientific Software International.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB

Fast Bayesian Estimation for the Four-Parameter Logistic Model (4PLM)

Abstract

Keywords

Bayesian Modal Estimation for the 4PLM

The Latent Mixture Modeling Reformulation of the 4PLM

Expectation Step and Artificial Data

Maximization Step 1: γ j -Parameters

Maximization Step 2: ς j -Parameters

Maximization Step 3: α j - and β j -Parameters

Standard Errors of Parameter Estimation in BE3M

Simulation Studies

Item and Person Parameter Generation

The Setting of MCMC in R Package FourPNO

The Setting of BE3M, BME in R Package mirt and MCMC in Stata

Results

Real Data Example

Results

Discussion

Supplemental Material

sj-docx-1-sgo-10.1177_21582440211052556 – Supplemental material for Fast Bayesian Estimation for the Four-Parameter Logistic Model (4PLM)

Footnotes

Author’s Note

Ethical Approval

Informed Consent

Data Availability

Declaration of Conflicting Interests

Funding

ORCID iDs

Supplemental Material

References

Supplementary Material

Maximization Step 1: $γ_{j}$ -Parameters

Maximization Step 2: $ς_{j}$ -Parameters

Maximization Step 3: $α_{j}$ - and $β_{j}$ -Parameters