Sage Journals: Discover world-class research

Abstract

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.

Keywords

achievement testing classical test theory equating item response theory standard errors

Standardized testing is commonly used for assessing individual achievement and its results greatly influence high-stakes decisions ranging from university admissions to various industry certifications. Standardized testing generally requires alternate test forms to be administered on multiple occasions. As a consequence, the tests often differ in difficulty from one administration to another, which poses a challenge with respect to comparability and fairness of the resulting test scores. In order to address this challenge, a statistical procedure known as equating is employed with the paramount goal of adjusting the scores on the test forms so that they yield interchangeable results (Kolen & Brennan, 2014).

Observed-score equating is one of the fundamental methods used in test equating. Rooted in classical test theory, it is concerned with establishing the equivalence of the observed scores on two test forms and includes both linear and equipercentile equating functions (von Davier, 2011). In this study, we focus on an equipercentile observed-score equating method called kernel equating, which was initially introduced by Holland and Thayer (1989) and further developed by von Davier et al. (2004).

The conceptual framework of kernel equating follows that of equipercentile observed-score equating and posits a series of steps to obtain the equated scores: (1) pre-smoothing of the data to reduce sampling variability, (2) obtaining discrete score probability distributions, (3) obtaining continuous approximations to the discrete score distributions, (4) calculating the equating function, and (5) calculating the standard errors of equating (SEEs) (von Davier, 2011; von Davier et al., 2004). A feature that distinguishes kernel equating from other equipercentile methods is that the continuous approximations of the score probability distributions are achieved through kernels that utilize bandwidth parameters. The bandwidths allow the density functions to be as smooth as possible while retaining the properties of the original distributions. Estimating such parameters, however, introduces additional sampling variability. This variability is typically not accounted for when calculating the standard errors of kernel equating, and therefore constitutes a threat to their accuracy (Holland et al., 1989; von Davier et al., 2004).

Accurate estimation of the SEE is integral to making correct inferences and comparisons. When estimated incorrectly, it can lead to unjustified certainty. One previous study derived modified standard errors of kernel equating when using a variant of the Silverman’s rule of thumb for bandwidth estimation (Andersson et al., 2014). The current study derives modified SEEs when using the more commonly applied approach to estimate the bandwidth by minimizing a penalty function. Such an approach is more generally appropriate and does not rely on a particular distributional assumption for the test scores. Thus, the objective of this article is to introduce a modified method of calculating the SEE which accounts for the additional variability stemming from the bandwidth estimation. The new approach is compared via simulations to the current method of calculating the SEE (Holland et al., 1989) across several sample sizes and test lengths.

We structure this article as follows. In the subsequent section, we give a brief background to the kernel method of equating and expand on the issue of bandwidth estimation and how it influences sampling variability. We also discuss how the standard errors of kernel equating are currently estimated. Next, the asymptotic variance of the bandwidth parameter estimator is derived and is incorporated in a modified method for calculating the SEE. This modified method is further verified and compared to the existing method in a simulation study. Lastly, the results are reported and discussed.

The Kernel Method of Test Equating

Data Collection Designs

An observed-score equating procedure consists of two fundamental components, namely, the data collection design and the equating method (von Davier et al., 2004). Hence, before we focus on the equating itself, it is essential to review, if only briefly, the common approaches to collecting the data. There are several data collection designs widely used in practice and they can roughly be divided into two categories: designs which use examinees from a common population taking both test forms and designs which use common items on the test forms (von Davier et al., 2004). The first category of data collection designs includes the equivalent groups, the single group, and the counterbalanced designs, and the second category includes the common-item non-equivalent groups design. In this study, we focus on the equivalent groups design, where two independent random samples are drawn from a common population, P, and one group takes test form X, while the other takes test form Y. In the following, we will use X and Y to denote both the test forms and the random variable corresponding to the test scores from each of the test forms.

The choice of an appropriate data collection design is subject to considerations like the available sample size, time, and costs. The designs subsequently affect the equating procedure implying that some designs, such as the equivalent groups design, allow for a relatively straightforward comparison between the test forms. Other designs are much more complex, such as the common-item non-equivalent groups design. A more detailed account of the considerations and procedures involved in various data collection designs can be found in von Davier et al. (2004).

Kernel Equating

In the following, we adopt the notation of von Davier et al. (2004). Let the target population be T, the possible score values on the test form X be x_j for j = 1, …, J; and let the possible score values on the test form Y be y_k for k = 1, …, K. Thus, we define the score probabilities as

r_{j} = Prob {X = x_{j} | T},

(1)and

s_{k} = Prob {Y = y_{k} | T} .

(2)

Further, an equipercentile equating function is defined in terms of the cumulative distribution functions (CDFs) which are given by

F (x) = Prob (X \leq x) = \sum_{j, x_{j} \leq x} r_{j},

(3)and

G (y) = Prob (Y \leq y) = \sum_{k, y_{k} \leq y} s_{k} .

(4)

When the CDFs are continuous, we obtain the equipercentile equating function of X to Y from

y = {Equi}_{Y} (x) = G^{- 1} (F (x)) .

(5)

Strictly speaking, however, most score distributions are discrete, and their continuous approximations are required. Kernel equating addresses this problem by introducing a series of steps which can be applied to various data collection designs and which provides continuous CDFs. The steps of kernel equating are pre-smoothing, estimation of the score probabilities, continuous approximation to the discrete score distributions, equating, and calculating the SEE (von Davier et al., 2004). We now briefly review the first two steps and dedicate subsequent subsections to present the remaining steps in more detail as they pertain to the subject at hand.

Pre-Smoothing

In the pre-smoothing step, a parametric statistical model is fitted to the observed data. This can be done by fitting log-linear or item response theory (IRT) models to the data. The methods are described in detail in Andersson and Wiberg (2017), and Holland and Thayer (1987), and are not repeated here.

Estimation of the Score Probabilities

Having estimated the score distributions with a pre-smoothing model, the score probabilities can be obtained using a linear or non-linear transformation which, following von Davier et al. (2004), we call the design function. The design function depends on the data collection design. For instance, consider the equivalent groups design and let $r = {(r_{1}, \dots, r_{J})}^{t}$ denote the column vector of the score probabilities of X and $s = {(s_{1}, \dots, s_{K})}^{t}$ denote the column vector of the score probabilities of Y. The design function (DF) is then a simple identity function, that is

DF (r, s) = (\begin{matrix} I_{J} & 0 \\ 0 & I_{K} \end{matrix}) (\begin{matrix} r \\ s \end{matrix}) = (\begin{matrix} r \\ s \end{matrix}),

(6)where I_J and I_K are J × J and K × K identity matrices. Design functions for other data collection designs are given explicitly in von Davier et al. (2004).

Continuous Approximation and Equating

The third step in kernel equating, distinguishing it from other equipercentile methods, is how the continuous approximations to the discrete CDFs, $F_{h_{X}} (x)$ and $G_{h_{Y}} (y)$ to F(x) and G(y), are obtained. In kernel equating, this is achieved by applying a kernel with a smoothing bandwidth parameter (von Davier et al., 2004). There are different kernels available in the literature (Lee et al., 2008), but in this study we focus on the Gaussian kernel, which is most commonly used. Following the notation of von Davier et al. (2004), let Φ(⋅) denote the CDF of the Gaussian distribution, and let h_X denote the bandwidth parameter. Then, the Gaussian kernel smoothing of the distribution of X has the CDF defined by

F_{h_{X}} (x) = \sum_{j} r_{j} Φ (R_{j_{X}} (x)),

(7)where R_jX(x) is given by

R_{j X} (x) = \frac{x - a_{X} x_{j} - (1 - a_{X}) μ_{X}}{a_{X} h_{X}},

(8)and a_X, μ_X, and

σ_{X}^{2}

are functions of r , that is

μ_{X} = \sum_{j} x_{j} r_{j},

(9)

σ_{X}^{2} = \sum_{j} {(x_{j} - μ_{X})}^{2} r_{j},

(10)

a_{X} = \sqrt{\frac{σ_{X}^{2}}{σ_{X}^{2} + h_{X}^{2}}} .

(11)

{\hat{G}}_{h_{Y}} (y)

is defined analogously.

It is evident from equations (7)–(11) that for the continuous approximation to be carried out, the bandwidth parameters h_X and h_Y have to be estimated. The primary goal of introducing such parameters is to make the density functions as smooth as possible while retaining the properties of the original distributions. Various methods of estimating the bandwidth parameter have been suggested in previous research (Andersson et al., 2014; von Davier et al., 2004). Of particular interest to this study is a method described in Holland and Thayer (1989) and von Davier et al. (2004), which estimates the bandwidth parameter by minimizing a penalty function, named PEN₁ (h_X) in von Davier et al. (2004), with respect to the bandwidth. The penalty function itself is based on the squared distances between proportions and the density function and is given by

{PEN}_{1} (h_{X}) = \sum_{j} (r_{j} - f_{h X} {(x_{j})}^{2}),

(12)where

f_{h_{X}} (x_{j})

is the density function, found by differentiating Equation 7 with respect to x, that is

f_{h_{X}} (x) = \sum_{j} r_{j} ϕ (R_{j X} (x)) \frac{1}{a_{X} h_{X}},

(13)and R_jX(x) is given in Equation 8. The density obtained as a result of estimating the bandwidth by minimizing PEN₁ is typically a good approximation of the discrete score distribution. Note that sometimes it can be beneficial to smooth the density function further, in which case an additional component named PEN₂ in von Davier et al. (2004) can be applied combined with PEN₁. However, because of the complexity in accounting for estimation variability with PEN₂, we focus exclusively on PEN₁ in the present study.

Once the continuous approximations are obtained, the equating function estimator for equating X to Y is given by

{\hat{e}}_{Y} (x; r^, s^) = {\hat{G}}_{h_{Y}}^{- 1} ({\hat{F}}_{h_{X}} (x; r^); s^) .

(14)

The equating function for equating Y to X is analogous and found by substitution.

Standard Error of Kernel Equating

The SEE is the measure of random equating error or uncertainty which stems from the equating function being subject to estimation and thereby sampling variability. We largely base this subsection on the work of Holland et al. (1989), who derived the asymptotic standard error for the kernel method of equating using the standard delta method for computing large sample approximations to the sampling variances of functions of statistics. Before proceeding, we see it appropriate to briefly introduce the multivariate delta method (Rao, 1973).

Adopting the notation of Rao (1973), let the (k × 1)-dimensional random vector $\sqrt{n} (T_{k n} - θ_{k})$ converge to a multivariate normal distribution with zero mean and covariance Σ, where T_kn is an estimator, θ_k is the true parameter vector, and n denotes the sample size. Let g denote a vector-valued function with components g₁,…, g_q, such that all the entries of g are differentiable. Then, $\sqrt{n} (g (T_{k n}) - g (θ_{k}))$ converges to a multivariate normal distribution with zero mean and covariance of GΣG^′, that is

\sqrt{n} (g (T_{k n}) - g (θ_{k})) \overset{d}{\to} N (0, G Σ G^{'}),

(15)where G is the (q × k) Jacobian matrix of partial derivatives of g with respect to θ_k. In this study, the equivalent of T_kn, θ_k, and g are the estimator of the score probabilities, the true score probabilities, and the equating function, respectively. The delta method was employed by Holland et al. (1989), where they defined the SEE for equating X to Y by

{SEE}_{Y} (x) = \sqrt{Var ({e^}_{Y} (x; r^, s^))} .

(16)

The SEE for equating Y to X is defined analogously.

Treating the bandwidth parameters h_X and h_Y as constants, Holland et al. (1989) assert that all the uncertainty in the equating function comes from the estimation of the score probabilities r and s . Hence, the variance of the equating function, and in turn the SEE, reflects the data collection design, the choice of the pre-smoothing technique used in the estimation of the population score probabilities, and the equating function itself.

Reiterating the notation used previously, let r and s define the vectors of the pre-smoothed score distributions. The calculation of the SEE per Holland et al. (1989) then requires two components: the vector $\partial_{e_{Y}}$ with derivatives of the equating function e_Y with respect to r and s , and the asymptotic covariance matrix $Σ_{(r^, s^)}$ . Using the delta method (Rao, 1973), the variance of the equating function ${e^}_{Y}$ can then be expressed as

Var ({e^}_{Y} (x; r^, s^)) = \partial_{e_{Y}} Σ_{(r^, s^)} {[\partial_{e_{Y}}]}^{'},

(17)where s and

Σ_{(r^, s^)}

is the covariance matrix of the independently estimated score probabilities, given by

Σ_{(r^, s^)} = [\begin{matrix} Σ_{r^} & 0 \\ 0 & Σ_{s^} \end{matrix}] .

(18)

The matrix $Σ_{(r^, s^)}$ has dimensions (J + K) × (J + K) where J is the dimension of r and K is the dimension of s (von Davier et al., 2004). The calculation of $Σ_{(r^, s^)}$ for different equating designs can be found in Andersson and Wiberg (2017), Holland et al. (1989) and von Davier et al. (2004).

The second component, $\partial_{e_{Y}}$ , can be defined as follows

\partial_{e_{Y}} = [\frac{\partial e_{Y}}{\partial r}, \frac{\partial e_{Y}}{\partial s}] .

(19)

Recalling Equation 14, the derivatives needed to compute $\partial_{e_{Y}}$ are defined in Holland et al. (1989) as

\frac{\partial e_{Y}}{\partial r_{j}} = \frac{1}{G^{'}} \frac{\partial F (x, r)}{\partial r_{j}},

(20)

\frac{\partial e_{Y}}{\partial s_{k}} = - \frac{1}{G^{'}} \frac{\partial G (e_{Y} (x); s)}{\partial s_{k}},

(21)where

\frac{\partial e_{Y}}{\partial r}

is a row vector with dimensions 1 × J,

\frac{\partial e_{Y}}{\partial s}

is a row vector with dimensions 1 × K, and G^′ is the density evaluated at e_Y (x), that is

G^{'} = \frac{\partial G (e_{Y} (x); s)}{\partial y},

(22)and

\frac{\partial F (x; r)}{\partial r_{j}} = Φ (R_{j X} (x; r)) - M_{j X} (x; r) \frac{\partial F (x; r)}{\partial x},

(23)where

\frac{\partial F (x; r)}{\partial x}

is given in Equation 13, R_jX (x; r ) in Equation 8, and

M_{j X} (x; r) = \frac{1}{2} (x - μ_{X}) (1 - a_{X}^{2}) z_{j X}^{2} + (1 - a_{X}) x_{j},

(24)where z_jX is defined as

z_{j X} = \frac{x_{j} - μ_{X}}{σ_{X}} .

(25)

The derivatives of e_X are analogous to those above and can be computed by substitution.

At this point, it is important to emphasize that the method of SEE calculation described above treats the bandwidth parameters h_X and h_Y as fixed and not as functions of r and s . Hence, the additional variability introduced by the bandwidth estimation is currently not accounted for in the calculation of the SEEs, and consequently poses a challenge with respect to their accuracy (Holland et al., 1989; von Davier et al., 2004).

Accounting for Bandwidth Estimation Variability in Kernel Equating

In this section, we first derive the bandwidth parameter estimator variance and then introduce a modified method for the calculation of the analytical SEE that accounts for bandwidth estimation variability.

Asymptotic Variance and Standard Error of the Bandwidth Parameter Estimator

Recalling the standard delta method restated in the previous section (Rao, 1973), it is important to note that the bandwidth parameter estimator is not defined explicitly but rather as an implicit function of other asymptotically normal variables. Therefore, we use a generalization of the delta method presented by Benichou and Gail (1989) which facilitates computing the asymptotic variance of the implicitly defined bandwidth parameter estimator. Following the notation of von Davier et al. (2004), h_X denotes the bandwidth parameter selected to minimize PEN₁ defined by Equation (12) and r denotes the vector of estimated score probabilities. Consider further that PEN₁ is a continuously differentiable function of the estimated score probabilities r in h_X, and the function is minimized so that $\frac{\partial {PEN}_{1}}{\partial h_{X}} = 0$ . Applying the implicit function theorem (Rao, 1973), we can then define h_X as a function of r such that $h_{X} = g_{h_{X}} (r)$ , and compute the partial derivatives of $g_{h_{X}} (r)$ with respect to r as

\frac{\partial g_{h_{X}} (r)}{\partial r} = - {(\frac{\partial^{2} {PEN}_{1}}{\partial h_{X}^{2}})}^{- 1} \frac{\partial^{2} {PEN}_{1}}{\partial h_{X} \partial r^{'}},

(26)where

\frac{\partial^{2} {PEN}_{1}}{\partial h_{X}^{2}}

is a scalar second order partial derivative of PEN₁ with respect to h_X and

\frac{\partial^{2} {PEN}_{1}}{\partial h_{X} \partial r^{'}}

is a 1 × J vector of second-order partial derivatives of PEN₁ with respect to r . The

\frac{\partial^{2} {PEN}_{1}}{\partial h_{X}^{2}}

and

\frac{\partial^{2} {PEN}_{1}}{\partial h_{X} \partial r^{'}}

derivatives are unequivocally calculated using the chain rule and implicit differentiation. The equations, however, are lengthy, and we summarize them in the appendix.

Let $Σ_{r^}$ denote the asymptotic covariance matrix of the estimated score probabilities r with dimensions J × J where J is the dimension of r . By applying the delta method for implicit functions (Benichou & Gail, 1989), we can define the asymptotic variance of the bandwidth parameter estimator ${\hat{h}}_{X}$ as

Var ({\hat{h}}_{X}) = \frac{\partial g_{h_{X}} (r)}{\partial r} Σ_{r^} {[\frac{\partial g_{h_{X}} (r)}{\partial r}]}^{'},

(27)and its standard error as

SE ({\hat{h}}_{X}) = \sqrt{Var ({\hat{h}}_{X})} .

(28)

The variance and the standard error of ${\hat{h}}_{Y}$ are analogous to those given for ${\hat{h}}_{X}$ and can be computed by substituting X by Y and r by s .

Standard Error of Equating Accounting for Bandwidth Variability

To account for the bandwidth estimation variability when computing the SEEs, we apply the chain rule together with the delta method and obtain a modified expression for the SEEs (Cox, 1984). Treating ${\hat{h}}_{X}$ and ${\hat{h}}_{Y}$ as functions of the score probability estimators $r^$ and $s^$ , we redefine Equation 14 by adding a term which accounts for the bandwidth estimation variability to obtain

\begin{array}{l} Var ({\hat{e}}_{Y} (x; r^, s^, {\hat{h}}_{X} (r^), {\hat{h}}_{Y} (s^))) = & \frac{\partial e_{Y} (x)}{\partial (r, s)} Σ_{(r^, s^)} {[\frac{\partial e_{Y} (x)}{\partial (r, s)}]}^{'} + \frac{\partial (h_{X} (r), h_{Y} (s))}{\partial (r, s)} \frac{\partial e_{Y} (x)}{\partial (h_{X}, h_{Y})} \\ \times Σ_{(r^, s^)} {[\frac{\partial (h_{X} (r), h_{Y} (s))}{\partial (r, s)} \frac{\partial e_{Y} (x)}{\partial (h_{X}, h_{Y})}]}^{'}, \end{array}

(29)where

\frac{\partial e_{Y} (x)}{\partial (r, s)}

is presented in equations (19)–(25) and

Σ_{(r^, s^)}

in Equation 18. When evaluating these expressions in practice, the true parameters are replaced by the parameter estimates. The additional components are then a (2 × (J + K))-matrix of partial derivatives of the bandwidth parameters as functions of the estimated score probabilities with respect to the estimated score probabilities,

\frac{\partial (h_{X} (r), h_{Y} (s))}{\partial (r, s)}

, calculated following Equation (26), that is

\frac{\partial (h_{X} (r), h_{Y} (s))}{\partial (r, s)} = [\begin{matrix} (- {(\frac{\partial^{2} {PEN}_{1}}{\partial h_{X}^{2}})}^{- 1} \frac{\partial^{2} {PEN}_{1}}{\partial h_{X} \partial r^{'}}) & 0 \\ 0 & (- {(\frac{\partial^{2} {PEN}_{1}}{\partial h_{Y}^{2}})}^{- 1} \frac{\partial^{2} {PEN}_{1}}{\partial h_{Y} \partial s^{'}}) \end{matrix}],

(30)and

\frac{\partial e_{Y} (x)}{\partial (h_{X}, h_{Y})}

, a (2 × J)-matrix of first-order derivatives of the equating function with respect to the bandwidth parameters, h_X and h_Y, defined by

\frac{\partial e_{Y}}{\partial h_{X}} = \frac{1}{G^{'}} \sum_{j} r_{j} \frac{\partial Φ (R_{j X} (x))}{\partial h_{X}},

(31)where

\frac{\partial Φ (R_{j X} (x))}{\partial h_{X}} = \sum_{j} r_{j} ϕ (R_{j X} (x)) \frac{\partial R_{j X} (x)}{\partial h_{X}},

(32)and

\frac{\partial e_{Y}}{\partial h_{Y}} = - \frac{1}{G^{'}} \sum_{k} r_{k} \frac{\partial Φ (R_{k Y} (y))}{\partial h_{Y}},

(33)where

\frac{\partial Φ (R_{k Y} (y))}{\partial h_{Y}} = \sum_{k} r_{k} ϕ (R_{k Y} (y)) \frac{\partial R_{k Y} (y)}{\partial h_{Y}},

(34)and G^′ is defined in Equation 22, with R_jX(x) and R_kY(y) given in Equation 8. Lastly, we define the SEE which accounts for bandwidth variability by

{SEE}_{Y} (x) = \sqrt{Var ({\hat{e}}_{Y} (x; r^, s^, {\hat{h}}_{X} (r^), {\hat{h}}_{Y} (s^)))} .

(35)

Simulation Study

To confirm the accuracy of the presented derivations, we conducted a simulation study to evaluate the estimators of the standard error of the bandwidth parameter estimator and the modified SEEs. We evaluated the estimated standard errors with respect to the Monte Carlo standard errors and compared the modified standard errors to the original standard errors that do not account for the bandwidth estimation.

Simulation Design

Data for two test forms X and Y were simulated using the two-parameter logistic (2-PL) model within the framework of IRT (de Ayala, 2009), where test lengths of 20, 40, and 80 items were considered. The discrimination parameters for both test forms were selected from the U (0.5, 2)-distribution and the difficulty parameters for one test form were selected from the N (0.25, 1)-distribution and the other from the N (−0.25, 1)-distribution. These distributions were considered to mimic realistic item parameters found in standardized testing (National Center for Education Statistics, 2004).

The equivalent groups design was used in which two independent random samples of individuals are drawn from a single common population and where each random sample takes either of the test forms X and Y (von Davier et al., 2004). Dictated by the design, no differences in the latent distributions were present between the groups. The latent distributions were set to the standard normal distribution. The equivalent groups design was considered because of its simplicity. Relative to other data collection designs, it provided an opportunity for direct comparison of the results on the test forms X and Y without additional considerations or assumptions. The score distributions for the tests X and Y with 20, 40, and 80 items are provided in Figure 1. The means (SD) of the test score distributions with 20, 40, and 80 items were 10.35 (4.52), 21.40 (8.43), and 44.03 (16.06) for test X and 7.58 (4.24), 18.17 (8.22), and 35.95 (16.89) for test Y.

Figure 1.

Score distributions with 20, 40, and 80 items, for each test X and Y.

In order to systematically verify the accuracy of the modified method of calculating the SEE as well as to explore how well it performs in a variety of sample sizes, sample sizes 1000, 4000, and 16,000 were considered. The study was conducted using version 3.6.2 of the statistical software environment R (R Core Team, 2019), primarily employing the packages kequate (Andersson et al., 2013), mirt (Chalmers, 2012), and numDeriv (Gilbert & Varadhan, 2019), while also utilizing newly written code implementing the modified SEEs (available in the Supplementary material). In each simulation setting, we used 10,000 replications which enabled us to all but eliminate the simulation random error. The convergence rate for all simulation settings was 100%.

The study followed the recommended kernel equating procedure (von Davier et al., 2004), albeit with a few adjustments to verify the derivations presented in this article. For each generated data set, the following steps were carried out:

(1) Pre-smoothing. The package mirt (Chalmers, 2012) was used to pre-smooth the irregularities of the raw data by estimating two separate 2-PL models to obtain item parameter estimates pertaining to each of the two groups and tests. The expectation-maximization (EM) algorithm was used for estimation (Bock & Aitkin, 1981), with the tolerance level 0.0001 and maximum number of iterations equal to 500. The asymptotic covariance matrix was estimated based on the method described in Oakes (1999).

(2) Estimation of score probabilities. Under the equivalent groups design, the score probabilities ${\hat{r}}_{j}$ and ${\hat{s}}_{k}$ were estimated based on the item parameter estimates and the assumed distribution of the latent variable (Andersson & Wiberg, 2017; von Davier et al., 2004).

(3) Continuous approximation. By adapting code from the package kequate (Andersson et al., 2013), continuous approximations to the discrete distributions were obtained by applying a Gaussian kernel with an optimal bandwidth parameter. Optimal bandwidth parameters ${\hat{h}}_{X}$ and ${\hat{h}}_{Y}$ were obtained by minimizing the first part of the penalty function, PEN₁ (von Davier et al., 2004). When optimizing the penalty function, a golden section search with successive parabolic interpolation (Brent, 1973) using the default tolerance of 1.50 × 10⁻⁸ was used.

The analytical derivations for the bandwidth parameter estimator variance were paramount to the study. Hence, upon obtaining the optimal bandwidths, the average standard errors of the bandwidth parameters were computed following the equations introduced in the previous section, and their accuracy was assessed using the Monte Carlo standard error (MCSE) as the criterion. When calculating the asymptotic variance of the bandwidth parameter estimator, the bandwidth parameters h_X and h_Y were replaced with the estimated parameters ${\hat{h}}_{X}$ and ${\hat{h}}_{Y}$ , and the asymptotic covariance matrices of the estimated score probabilities $Σ_{r^}$ and $Σ_{s^}$ were calculated based on the implementation in kequate (Andersson et al., 2013).

(4) Equating. Upon obtaining continuous CDFs, an equipercentile equating function was applied to equate the test forms X and Y.

(5) Calculating the SEE. The average analytical SEEs were computed using the original method for calculating the SEE without accounting for the bandwidth variability (Holland et al., 1989), and the modified method of calculating the SEE accounting for the bandwidth variability. The Monte Carlo SEEs (MCSEE) were used as a criterion for comparing the accuracy of the modified and the original methods of the SEE calculation.

Furthermore, two measures were used to assess the performance and the accuracy of the modified method as compared to the original method. We computed the absolute differences of the means of the SEEs calculated with the original and the modified methods. Additionally, the average coverage probabilities were considered which explored the average proportion of time that the 95% confidence intervals calculated employing the original and the modified methods contained the true values of the equated results. The confidence intervals were estimated with ${\hat{e}}_{Y} \pm z_{0.975} \times \hat{SE} ({\hat{e}}_{Y})$ , with z_0.975 indicating the 0.975 quantile of the standard normal distribution.

The analytical derivations used in computing the bandwidth estimator variance and standard errors, as well as the SEE, were verified numerically using the R package numDeriv (Gilbert & Varadhan, 2019). The R code is available for review in the Supplementary material.

Simulation Results

The study largely depended on the accuracy of the asymptotic variance and standard error of the bandwidth parameter estimator derivations. The results of the simulation for the standard errors of the bandwidth parameter estimators

{\hat{h}}_{X}

and

{\hat{h}}_{Y}

given in Table 1 confirmed that the derivations were correct, and the asymptotic standard errors of the bandwidth parameter estimator (ASE) were accurate as witnessed by the comparison to the MCSE. As can be expected for asymptotic variance approximation (Ferguson, 1996), the differences between the ASEs and the MCSEs were larger in smaller sample sizes.

Table 1.

Asymptotic Standard Errors (ASE) and Monte Carlo Standard Errors (MCSE) for the Bandwidth Parameters h_X and h_Y with Sample Sizes N and Test Lengths of 20, 40, and 80 Items.

	h_X Parameter		h_Y Parameter
N	ASE	MCSE	ASE	MCSE
20 items
1000	0.0035	0.0035	0.0035	0.0035
4000	0.0018	0.0017	0.0018	0.0017
16000	0.0009	0.0009	0.0009	0.0009
40 items
1000	0.0034	0.0034	0.0037	0.0037
4000	0.0017	0.0017	0.0019	0.0019
16000	0.0009	0.0009	0.0009	0.0009
80 items
1000	0.0033	0.0033	0.0036	0.0036
4000	0.0016	0.0016	0.0018	0.0018
16000	0.0008	0.0008	0.0009	0.0009

Subsequently incorporating the bandwidth estimation variability into computing the modified SEEs, Table 2 presents two performance measures used to compare the accuracy of the original standard errors of equating (ASEE) and the modified asymptotic standard errors of equating (ASEE_mod). These measures are the absolute aggregate differences between the SEEs for two pairs, ASEE - MCSEE and ASEE_mod - MCSEE, and the average coverage for both the original and the modified methods. From the average differences, it was evident that when compared to the MCSEE estimates, the modified asymptotic SEEs which take bandwidth variability into account were accurate for all sample sizes and test lengths. Furthermore, the modified asymptotic SEEs in most cases appeared to be nearly identical to those not accounting for bandwidth variability, suggesting that the bandwidth estimation influence on the SEEs was minimal. This finding was further supported by the average coverage for both the original and the modified methods. Although the modified method performed better in most settings, the differences in coverage were small.

Table 2.

Absolute Average Differences for the Original Asymptotic Standard Errors of Equating (ASEE) and the Modified Asymptotic Standard Errors of Equating (ASEE_mod) to the Monte Carlo Standard Errors of Equating (MCSEE) and Average Coverage of 95% Confidence Intervals Based on the ASEE and the ASEE_mod.

	Average Differences		Average Coverage
N	ASEE-MCSEE	ASEE_mod-MCSEE	ASEE	ASEE_mod
20 items
1000	0.0012	0.0016	94.97	95.03
4000	0.0002	0.0004	95.07	95.13
16000	0.0004	0.0004	94.87	94.93
40 items
1000	0.0021	0.0019	94.97	94.99
4000	0.0012	0.0011	94.82	94.84
16000	0.0011	0.0011	94.66	94.69
80 items
1000	0.0038	0.0041	94.87	94.90
4000	0.0050	0.0054	95.39	95.42
16000	0.0004	0.0006	94.90	94.93

Discussion

The kernel method of equating is an equipercentile equating method in which number-correct scores are transformed into percentile rank scores from test form X to the scale of test form Y, and the scores from the two test forms with the same percentile rank are considered to be equivalent (von Davier et al., 2004). However, in order to obtain those equivalent scores, continuous approximations to the discrete score distributions are needed. To satisfy this requirement, kernel equating uses a Gaussian kernel with a smoothing bandwidth parameter that determines the characteristics of the continuous approximations to the raw discrete distributions (von Davier et al., 2004). The most commonly used method for bandwidth estimation is minimizing a penalty function with respect to the bandwidth parameter (von Davier et al., 2004). The bandwidth, in turn, is influenced by the estimated score probabilities and therefore is subject to variability. This variability, however, is not currently accounted for when calculating the SEE (Holland et al., 1989), challenging its accuracy and, ultimately, the fairness of the equated results.

The present study explored the issue of the additional variability stemming from the bandwidth estimation and its impact on the SEE. Building on the existing methodology of Holland et al. (1989) and von Davier et al. (2004), we derived the asymptotic variance of the bandwidth parameter estimator using the delta method for implicit functions (Benichou & Gail, 1989) and incorporated those derivations to expand the existing formulas for calculating the SEE (Holland et al., 1989). Thus, we have introduced SEEs that account for bandwidth estimation variability. A simulation study with 18 data sets generated for a wide range of sample sizes and test lengths was used to illustrate the results of the modified method as compared to the current method of the SEE calculation (Holland et al., 1989) and the MCSEEs.

The results offered several observations which are valuable to the testing industry. Firstly, the newly introduced SEE were accurate and close to the MCSEE estimates for all sample sizes and test lengths, suggesting that the method is suitable for practical use. Secondly, using the MCSEE as a criterion, the results of the study indicate that the original (Holland et al., 1989) and the modified SEEs produce similar results, suggesting that the bandwidth estimation impact on the SEE is minimal.

The presented results apply directly to any pre-smoothing method, provided that the asymptotic covariance matrix of the score probabilities has been defined for such a method. However, in this study we only utilized IRT as the pre-smoothing method and the results may be different if instead using, for example, log-linear models. However, previous research has indicated that the SEEs are fairly accurate even when not accounting for the bandwidth estimation with the penalty function, and so we do not anticipate that the results will differ substantially when using pre-smoothing with log-linear models instead of IRT models.

The method for accounting for bandwidth estimation that we used in the present study can be generalized to additional kernels and equating designs by modifying the presented results to account for the different expressions of the equating function and score probabilities with such approaches. It is furthermore possible to utilize the delta method for implicit functions with other bandwidth estimation methods provided that those specify a function that is minimized which fulfills the properties required for the implicit function theorem and the delta method. One approach which does not fulfill these requirements is the method based on PEN₂, since the function PEN₂ is not differentiable and can have multiple local minima.

It is important to note that in this study, we derived the modified asymptotic SEE for two test forms in the setting of the equivalent groups data collection design. It can be the case that the bandwidth estimation influence on the SEE is greater for other data collection and equating designs. It would, therefore, be beneficial for future theoretical and empirical studies to focus on determining the bandwidth estimation impact on the SEE in these additional scenarios.

As a final note, we believe that it is theoretically more sound to use a method which successfully accounts for all sources of variability, however negligible those may be. Introducing the modifications to the formulas for the SEE calculation akin to those explored in this study can improve the accuracy of the standard errors of equating, and consequently, facilitate fairness and comparability of the equated results.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Kseniia Marcq

Björn Andersson

Computation of the Penalty Function

In order to compute Equation 26, two partial derivatives of the PEN₁ (h_X) function need to be defined, the second-order partial derivative of PEN₁ (h_X) with respect to h_X and the second-order partial derivative of PEN₁ (h_X) with respect to h_X and r . Hence, we first need to compute the partial derivative of PEN₁ (h_X) with respect to h_X.

Recalling equations (8)–(13), we define the first partial derivative of PEN₁ (h_X) with respect to h_X as (36)

∂ PEN 1 ∂ h X = ∂ ∂ h X [ ∑ j ( r j − f h X ( x j ) 2 ] = − 2 ∑ j ( ( r j − f h X ( x j ) ) ∂ f h X ( x j ) ∂ h X ) .

We then need to calculate ∂ f h X ( x j ) ∂ h X as (37)

∂ f h X ( x j ) ∂ h X = ∂ ∂ h X [ ∑ j r j ϕ ( R j X ( x ) ) 1 a X h X ] = ∑ j r j ( ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X 1 a X h X + ϕ ( R j X ( x ) ) ∂ ∂ h X [ 1 a X h X ] ) ,

where ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X is (38)

∂ [ ϕ ( R j X ( x ) ) ] ∂ h X = − ϕ ( R j X ( x ) ) ∂ R j X ( x ) ∂ h X R j X ( x ) ,

with (39)

∂ R j X ( x ) ∂ h X = ∂ ∂ h X [ x − a X x j − ( 1 − a X ) μ X a X h X ] = ∂ a X ∂ h X ( μ X − x j ) 1 a X h X + ( x − a X x j − ( 1 − a X ) μ X ) ∂ ∂ h X [ 1 a X h X ] .

The remaining components needed for computing the first partial derivative of PEN₁ (h_X) with respect to h_X are then ∂ ∂ h X [ 1 a X h X ] and ∂ a X ∂ h X . Thus we calculate (40)

∂ ∂ h X [ 1 a X h X ] = − a X − 2 ∂ a X ∂ h X 1 h X − 1 a X h X 2 ,

(41)

∂ a X ∂ h X = ∂ ∂ h X [ σ X σ X 2 + h X 2 ] = − σ X 2 ( h X 2 + σ X 2 ) 3 2 ∂ h X 2 ∂ h X + ∂ σ X 2 ∂ h X = − σ X h X ( h X 2 + σ X 2 ) 3 2 .

Using equations (36)–(41), we can compute the second partial derivative of PEN₁ (h_X) with respect to h_X as (42)

∂ 2 PEN 1 ∂ h X 2 = ∂ ∂ h X [ − 2 ∑ j ( ( r j − f h X ( x j ) ) ∂ f h X ( x j ) ∂ h X ) ] = − 2 ∑ j ( ∂ 2 f h X ( x j ) ∂ h X 2 ( r j − f h X ( x j ) ) − [ ∂ f h X ( x j ) ∂ h X ] 2 ) ,

where ∂ f h X ( x j ) ∂ h X is defined in Equation 37 and ∂ 2 f h X ( x j ) ∂ h X 2 is given by (43)

∂ 2 f h X ( x j ) ∂ h X 2 = ∂ ∂ h X [ ∑ j r j ( ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X 1 a X h X + ϕ ( R j X ( x ) ) ∂ ∂ h X [ 1 a X h X ] ) ] = ∑ j r j ( ∂ 2 [ ϕ ( R j X ( x ) ) ] ∂ h X 2 1 a X h X + ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X ∂ ∂ h X [ 1 a X h X ] ) + ∑ j r j ( ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X ∂ ∂ h X [ 1 a X h X ] + ϕ ( R j X ( x ) ) ∂ 2 ∂ h X 2 [ 1 a X h X ] ) .

Recall that ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X is given in Equation 38 and ∂ R j X ( x ) ∂ h X in Equation 39. Hence, we define ∂ 2 [ ϕ ( R j X ( x ) ) ] ∂ h X 2 and ∂ 2 R j X ( x ) ∂ h X 2 as (44)

∂ 2 [ ϕ ( R j X ( x ) ) ] ∂ h X 2 = ∂ ∂ h X [ − ϕ ( R j X ( x ) ) ∂ R j X ( x ) ∂ h X R j X ( x ) ] = − ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X ∂ R j X ( x ) ∂ h X R j X ( x ) − ϕ ( R j X ( x ) ) ∂ 2 R j X ( x ) ∂ h X 2 R j X ( x ) − ϕ ( R j X ( x ) ) [ ∂ [ R j X ( x ) ] ∂ h X ) ] 2 ,

(45)

∂ 2 R j X ( x ) ∂ h X 2 = ∂ ∂ h X [ ∂ a X ∂ h X ( μ X − x j ) 1 a X h X + ( x − a X x j − ( 1 − a X ) μ X ) ∂ ∂ h X [ 1 a X h X ] ] = ( μ X − x j ) ( ∂ 2 a X ∂ h X 2 1 a X h X + ∂ a X ∂ h X ∂ ∂ h X [ 1 a X h X ] ) + ( μ X − x j ) ∂ a X ∂ h X ∂ ∂ h X [ 1 a X h X ] + ( x − a X x j − ( 1 − a X ) μ X ) ∂ 2 ∂ h X 2 [ 1 a X h X ] .

Consider further that ∂ ∂ h X [ 1 a X h X ] is defined in Equation 40 and ∂ a X ∂ h X in Equation 41. We can then observe that ∂ 2 ∂ h X 2 [ 1 a X h X ] can be computed as (46)

∂ 2 ∂ h X 2 [ 1 a X h X ] = 1 h X a X 2 ( 2 a X [ ∂ a X ∂ h X ] 2 − ∂ 2 a X ∂ h X 2 + ∂ a X ∂ h X 1 h X ) + ( ∂ a X ∂ h X 1 a X 2 h X 2 + 2 a X h X 3 ) ,

and (47)

∂ 2 a X ∂ h X 2 = ∂ ∂ h X [ − σ X h X ( h X 2 + σ X 2 ) 3 2 ] = − σ X ( h X 2 + σ X 2 ) 3 2 − 3 h X 2 ( h X 2 + σ X 2 ) 5 2 .

Lastly, we can compute the second partial derivative of PEN₁ (h_X) with respect to r as follows (48)

∂ 2 PEN 1 ∂ h X ∂ r i = ∂ ∂ r i [ − 2 ∑ j ( ( r j − f h X ( x j ) ) ∂ f h X ( x j ) ∂ h X ) ] = − 2 ∑ j [ ( ∂ r j ∂ h X ∂ r i − ∂ f h X ( x j ) ∂ r i ) ∂ f h X ( x j ) ∂ h X + ( r j − f h X ( x j ) ) ∂ 2 f h X ( x j ) ∂ h X ∂ r i ] ,

where ∂ r j ∂ h X ∂ r i = 1 if i = j and ∂ r j ∂ h X ∂ r i = 0 if i ≠ j. Note that ∂ f h X ( x j ) ∂ h X is given in Equation 37. Then, the components needed for computing Equation 48 are ∂ f h X ( x j ) ∂ r i and ∂ 2 f h X ( x j ) ∂ h X ∂ r i . We define ∂ f h X ( x j ) ∂ r i as (49)

∂ f h X ( x ) ∂ r i = ∂ ∂ r i [ ∑ j r j ϕ ( R j X ( x ) ) 1 a X h X ] = 1 h X [ ϕ ( R j X ( x ) ) 1 a X − ∂ R j X ( x ) ∂ r i ∑ j ( r j ϕ ( R j X ( x ) ) R j X ( x ) ) 1 a X ] + 1 h X [ ∂ ∂ r i [ 1 a X ] ∑ j ( r j ϕ ( R j X ( x ) ) ) ] ,

where ∂ R j X ∂ r i and ∂ ∂ r [ 1 a X ] are given in Holland et al. (1989) as (50)

∂ ∂ r i [ 1 a X ] = − 1 2 a X h X 2 σ X 2 x i 2 − μ X 2 σ X 2 ,

and (51)

∂ R j X ∂ r i = ( − 1 a X h X ) [ 1 2 ( x − μ X ) ( 1 − a X 2 ) ( x i 2 − μ X 2 σ X 2 ) + ( 1 − a X ) x i ] .

We further define ∂ 2 f h X ( x j ) ∂ h X ∂ r i as (52)

∂ 2 f h X ( x j ) ∂ h X ∂ r i = ∂ ∂ h X [ ∂ f h X ( x j ) ∂ r i ] .

Given Equation 49 is a lengthy expression, we further simplify the notation such that (53)

∂ ∂ h X [ ∂ f h X ( x j ) ∂ r i ] = ∂ ∂ h X [ 1 h X ] × P + 1 h X × ∂ P ∂ h X = − 1 h X 2 × P + 1 h X × ∂ P ∂ h X ,

where (54)

P = ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X 1 a X − ∂ R j X ( x ) ∂ r i ∑ j ( r j ϕ ( R j X ( x ) ) R j X ( x ) ) 1 a X + ∂ ∂ r i [ 1 a X ] ∑ j ( r j ϕ ( R j X ( x ) ) ) .

Noting the three components in Equation 54, ∂ P ∂ h X can then be presented as follows (55)

∂ P ∂ h X = ∂ P 1 ∂ h X − ∂ P 2 ∂ h X + ∂ P 3 ∂ h X .

∂ P 1 ∂ h X is given by (56)

∂ P 1 ∂ h X = ∂ ∂ h X [ ϕ ( R j X ( x ) ) 1 a X ] = ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X 1 a X + ϕ ( R j X ( x ) ) a X h X σ X 2 ,

where ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X is given in Equation 38. ∂ P 2 ∂ h X is defined as (57)

∂ P 2 ∂ h X = ∂ ∂ h X [ ∂ R j X ( x ) ∂ r i ∑ j ( r j ϕ ( R j X ( x ) R j X ( x ) ) ) 1 a X ] = ∂ 2 R j X ( x ) ∂ h X ∂ r i ∑ j ( r j ϕ ( R j X ( x ) ) R j X ( x ) ) 1 a X + ∂ R j X ( x ) ∂ r i ∑ j ( r j ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X R j X ( x ) + r j ϕ ( R j X ( x ) ) ∂ R j X ( x ) ∂ h X ) 1 a X + ∂ R j X ( x ) ∂ r i ∑ j ( r j ϕ ( R j X ( x ) R j X ( x ) ) ) a X h X σ X 2 ,

where ∂ R j X ( x ) ∂ r i is defined in Equation 51, ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X in Equation 38, and ∂ 2 R j X ( x ) ∂ h X ∂ r i is given by (58)

∂ 2 R j X ( x ) ∂ h X ∂ r i = ∂ ∂ h X [ ( − 1 a X h X ) ( 1 2 ( x − μ X ) ( 1 − a X 2 ) ( x 2 − μ X 2 σ X 2 ) + ( 1 − a X ) x ) ] = ( 1 a X h X 2 − a X σ X 2 ) ( 1 2 ( x − μ X ) ( 1 − a X 2 ) ( x i 2 − μ X 2 σ X 2 ) + ( 1 − a X ) x ) + ( − 1 a X h X ) ( ( − ( x − μ X ) ( x 2 − μ X 2 σ X 2 ) a X ∂ a X ∂ h X ) + x ∂ a X ∂ h X ) .

It remains to calculate ∂ P 3 ∂ h X as follows (59)

∂ P 3 ∂ h X = ∂ ∂ h X [ ∂ ∂ r i [ 1 a X ] ∑ j ( r j ϕ ( R j X ( x ) ) ) ] = ∂ 2 ∂ h X ∂ r i [ 1 a ] ∑ j ( r j ϕ ( R j X ( x ) ) ) + ∂ ∂ r i [ 1 a X ] ∂ ∂ h X [ ∑ j ( r j ϕ ( R j X ( x ) ) ) ] = − ( ( x i 2 − μ X 2 σ X 2 ) 1 2 σ X 2 ) ( ∂ a X ∂ h X h X 2 + 2 a X h X ) ∑ j ( r j ϕ ( R j X ( x ) ) ) + ∂ ∂ r i [ 1 a X ] ∑ j ( r j ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X ) ,

where ∂ ∂ r i [ 1 a X ] is given in Equation 50, ∂ a X ∂ h X in Equation 41, and ∂ [ ϕ ( R j X ( x ) ) ] ∂ h X in Equation 38.

The partial derivatives of the PEN₁ (h_Y) with respect to h_Y, ∂ PEN 1 ∂ h Υ , ∂ 2 PEN 1 ∂ h Υ and ∂ 2 PEN 1 ∂ h Υ ∂ s ι , are computed analogously.

References

Andersson

Bränberg

Wiberg

(2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25. https://doi.org/10.18637/jss.v055.i06

Andersson

von Davier

A. A.

, (2014). Improving the bandwidth selection in kernel equating. Journal of Educational Measurement, 51(3), 223–238. https://doi.org/10.1111/jedm.12044

Andersson

Wiberg

(2017). Item response theory observed-score kernel equating. Psychometrika, 82(1), 48–66. https://doi.org/10.1007/s11336-016-9528-7

Benichou

Gail

M. H.

(1989). A delta method for implicitly defined random variables. The American Statistician, 43(1), 41–44. https://doi.org/10.1080/00031305.1989.10475608

Bock

R. D.

Aitkin

(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801

Brent

(1973). Algorithms for minimization without derivatives. Prentice-Hall.

Chalmers

R. P.

(2012). Mirt : A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Cox

(1984). An elementary introduction to maximum likelihood estimation for multinomial models: Birch’s theorem and the delta method. The American Statistician, 38(4), 283–287. https://doi.org/10.1080/00031305.1984.10483226

de Ayala

R. J

(2009). The theory and practice of item response theory. Guilford Press.

10.

Ferguson

T. S

(1996). A course in large sample theory. Routledge.

11.

Gilbert

Varadhan

(2019). Accurate numerical derivatives [Computer software manual]. https://CRAN.R-project.org/package=numDeriv (R package version 2016.8-1.1)

12.

Holland

P. W.

King

B. F.

Thayer

D. T

. (1989). The standard error of equating for the kernel method of equating score distributions. ETS Research Report Series, 1989(1), i–52. https://doi.org/10.1002/j.2330-8516.1989.tb00332.x

13.

Holland

P. W.

Thayer

D. T

. (1987). Notes on the use of log-linear models for fitting discrete probability distributions. ETS Research Report Series, 1987(2), i–40. https://doi.org/10.1002/j.2330-8516.1987.tb00235.x

14.

Holland

P. W.

Thayer

D. T

. (1989). The kernel method of equating score distributions. ETS Research Report Series, 1989(1), i–45. https://doi.org/10.1002/j.2330-8516.1989.tb00333.x

15.

Kolen

M. J.

Brennan

R. L

(2014). Test equating, scaling, and linking: methods and practices (3rd ed.). Springer.

16.

Lee

Y.-H.

von Davier

A. A

. (2008). Comparing alternative kernels for the kernel method of test equating: Gaussian, logistic, and uniform kernels. ETS Research Report Series, 2008(1), i–26. https://doi.org/10.1002/j.2333-8504.2008.tb02098.x

17.

National Center for Education Statistics . (2004). NAEP technical documentation - mathematics assessment IRT parameters. https://nces.ed.gov/nationsreportcard/tdw/analysis/scaling_irt_math.aspx

18.

Oakes

(1999). Direct calculation of the information matrix via the EM algorithm. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 61(2), 479–482. https://doi.org/10.1111/1467-9868.00188

19.

R Core Team. (2019). R: A language and environment for statistical computing [Computer software manual]. https://www.R-project.org/

20.

Rao

C. R

(1973). Linear statistical inference and its applications (2nd ed.). Wiley.

21.

von Davier

A. A.

(2011). Statistical models for test equating, scaling. Springer. https://doi.org/10.1007/978-0-387-98138-3

22.

von Davier

A. A.

Holland

P. W.

Thayer

D. T

(2004). The kernel method of test equating. Springer-Verlag.

Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation

Abstract

Keywords

The Kernel Method of Test Equating

Data Collection Designs

Kernel Equating

Pre-Smoothing

Estimation of the Score Probabilities

Continuous Approximation and Equating

Standard Error of Kernel Equating

Accounting for Bandwidth Estimation Variability in Kernel Equating

Asymptotic Variance and Standard Error of the Bandwidth Parameter Estimator

Standard Error of Equating Accounting for Bandwidth Variability

Simulation Study

Simulation Design

Simulation Results

Discussion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Computation of the Penalty Function

References