Abstract
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.
Standardized testing is commonly used for assessing individual achievement and its results greatly influence high-stakes decisions ranging from university admissions to various industry certifications. Standardized testing generally requires alternate test forms to be administered on multiple occasions. As a consequence, the tests often differ in difficulty from one administration to another, which poses a challenge with respect to comparability and fairness of the resulting test scores. In order to address this challenge, a statistical procedure known as equating is employed with the paramount goal of adjusting the scores on the test forms so that they yield interchangeable results (Kolen & Brennan, 2014).
Observed-score equating is one of the fundamental methods used in test equating. Rooted in classical test theory, it is concerned with establishing the equivalence of the observed scores on two test forms and includes both linear and equipercentile equating functions (von Davier, 2011). In this study, we focus on an equipercentile observed-score equating method called kernel equating, which was initially introduced by Holland and Thayer (1989) and further developed by von Davier et al. (2004).
The conceptual framework of kernel equating follows that of equipercentile observed-score equating and posits a series of steps to obtain the equated scores: (1) pre-smoothing of the data to reduce sampling variability, (2) obtaining discrete score probability distributions, (3) obtaining continuous approximations to the discrete score distributions, (4) calculating the equating function, and (5) calculating the standard errors of equating (SEEs) (von Davier, 2011; von Davier et al., 2004). A feature that distinguishes kernel equating from other equipercentile methods is that the continuous approximations of the score probability distributions are achieved through kernels that utilize bandwidth parameters. The bandwidths allow the density functions to be as smooth as possible while retaining the properties of the original distributions. Estimating such parameters, however, introduces additional sampling variability. This variability is typically not accounted for when calculating the standard errors of kernel equating, and therefore constitutes a threat to their accuracy (Holland et al., 1989; von Davier et al., 2004).
Accurate estimation of the SEE is integral to making correct inferences and comparisons. When estimated incorrectly, it can lead to unjustified certainty. One previous study derived modified standard errors of kernel equating when using a variant of the Silverman’s rule of thumb for bandwidth estimation (Andersson et al., 2014). The current study derives modified SEEs when using the more commonly applied approach to estimate the bandwidth by minimizing a penalty function. Such an approach is more generally appropriate and does not rely on a particular distributional assumption for the test scores. Thus, the objective of this article is to introduce a modified method of calculating the SEE which accounts for the additional variability stemming from the bandwidth estimation. The new approach is compared via simulations to the current method of calculating the SEE (Holland et al., 1989) across several sample sizes and test lengths.
We structure this article as follows. In the subsequent section, we give a brief background to the kernel method of equating and expand on the issue of bandwidth estimation and how it influences sampling variability. We also discuss how the standard errors of kernel equating are currently estimated. Next, the asymptotic variance of the bandwidth parameter estimator is derived and is incorporated in a modified method for calculating the SEE. This modified method is further verified and compared to the existing method in a simulation study. Lastly, the results are reported and discussed.
The Kernel Method of Test Equating
Data Collection Designs
An observed-score equating procedure consists of two fundamental components, namely, the data collection design and the equating method (von Davier et al., 2004). Hence, before we focus on the equating itself, it is essential to review, if only briefly, the common approaches to collecting the data. There are several data collection designs widely used in practice and they can roughly be divided into two categories: designs which use examinees from a common population taking both test forms and designs which use common items on the test forms (von Davier et al., 2004). The first category of data collection designs includes the equivalent groups, the single group, and the counterbalanced designs, and the second category includes the common-item non-equivalent groups design. In this study, we focus on the equivalent groups design, where two independent random samples are drawn from a common population,
The choice of an appropriate data collection design is subject to considerations like the available sample size, time, and costs. The designs subsequently affect the equating procedure implying that some designs, such as the equivalent groups design, allow for a relatively straightforward comparison between the test forms. Other designs are much more complex, such as the common-item non-equivalent groups design. A more detailed account of the considerations and procedures involved in various data collection designs can be found in von Davier et al. (2004).
Kernel Equating
In the following, we adopt the notation of von Davier et al. (2004). Let the target population be
Further, an equipercentile equating function is defined in terms of the cumulative distribution functions (CDFs) which are given by
When the CDFs are continuous, we obtain the equipercentile equating function of
Strictly speaking, however, most score distributions are discrete, and their continuous approximations are required. Kernel equating addresses this problem by introducing a series of steps which can be applied to various data collection designs and which provides continuous CDFs. The steps of kernel equating are pre-smoothing, estimation of the score probabilities, continuous approximation to the discrete score distributions, equating, and calculating the SEE (von Davier et al., 2004). We now briefly review the first two steps and dedicate subsequent subsections to present the remaining steps in more detail as they pertain to the subject at hand.
Pre-Smoothing
In the pre-smoothing step, a parametric statistical model is fitted to the observed data. This can be done by fitting log-linear or item response theory (IRT) models to the data. The methods are described in detail in Andersson and Wiberg (2017), and Holland and Thayer (1987), and are not repeated here.
Estimation of the Score Probabilities
Having estimated the score distributions with a pre-smoothing model, the score probabilities can be obtained using a linear or non-linear transformation which, following von Davier et al. (2004), we call the design function. The design function depends on the data collection design. For instance, consider the equivalent groups design and let
Continuous Approximation and Equating
The third step in kernel equating, distinguishing it from other equipercentile methods, is how the continuous approximations to the discrete CDFs,
It is evident from equations (7)–(11) that for the continuous approximation to be carried out, the bandwidth parameters
Once the continuous approximations are obtained, the equating function estimator for equating
The equating function for equating
Standard Error of Kernel Equating
The SEE is the measure of random equating error or uncertainty which stems from the equating function being subject to estimation and thereby sampling variability. We largely base this subsection on the work of Holland et al. (1989), who derived the asymptotic standard error for the kernel method of equating using the standard delta method for computing large sample approximations to the sampling variances of functions of statistics. Before proceeding, we see it appropriate to briefly introduce the multivariate delta method (Rao, 1973).
Adopting the notation of Rao (1973), let the (
The SEE for equating
Treating the bandwidth parameters
Reiterating the notation used previously, let
The matrix
The second component,
Recalling Equation 14, the derivatives needed to compute
The derivatives of
At this point, it is important to emphasize that the method of SEE calculation described above treats the bandwidth parameters
Accounting for Bandwidth Estimation Variability in Kernel Equating
In this section, we first derive the bandwidth parameter estimator variance and then introduce a modified method for the calculation of the analytical SEE that accounts for bandwidth estimation variability.
Asymptotic Variance and Standard Error of the Bandwidth Parameter Estimator
Recalling the standard delta method restated in the previous section (Rao, 1973), it is important to note that the bandwidth parameter estimator is not defined explicitly but rather as an implicit function of other asymptotically normal variables. Therefore, we use a generalization of the delta method presented by Benichou and Gail (1989) which facilitates computing the asymptotic variance of the implicitly defined bandwidth parameter estimator. Following the notation of von Davier et al. (2004),
Let
The variance and the standard error of
Standard Error of Equating Accounting for Bandwidth Variability
To account for the bandwidth estimation variability when computing the SEEs, we apply the chain rule together with the delta method and obtain a modified expression for the SEEs (Cox, 1984). Treating
Simulation Study
To confirm the accuracy of the presented derivations, we conducted a simulation study to evaluate the estimators of the standard error of the bandwidth parameter estimator and the modified SEEs. We evaluated the estimated standard errors with respect to the Monte Carlo standard errors and compared the modified standard errors to the original standard errors that do not account for the bandwidth estimation.
Simulation Design
Data for two test forms
The equivalent groups design was used in which two independent random samples of individuals are drawn from a single common population and where each random sample takes either of the test forms Score distributions with 20, 40, and 80 items, for each test 
In order to systematically verify the accuracy of the modified method of calculating the SEE as well as to explore how well it performs in a variety of sample sizes, sample sizes 1000, 4000, and 16,000 were considered. The study was conducted using version 3.6.2 of the statistical software environment R (R Core Team, 2019), primarily employing the packages
The study followed the recommended kernel equating procedure (von Davier et al., 2004), albeit with a few adjustments to verify the derivations presented in this article. For each generated data set, the following steps were carried out:
The analytical derivations for the bandwidth parameter estimator variance were paramount to the study. Hence, upon obtaining the optimal bandwidths, the average standard errors of the bandwidth parameters were computed following the equations introduced in the previous section, and their accuracy was assessed using the Monte Carlo standard error (MCSE) as the criterion. When calculating the asymptotic variance of the bandwidth parameter estimator, the bandwidth parameters
Furthermore, two measures were used to assess the performance and the accuracy of the modified method as compared to the original method. We computed the absolute differences of the means of the SEEs calculated with the original and the modified methods. Additionally, the average coverage probabilities were considered which explored the average proportion of time that the 95% confidence intervals calculated employing the original and the modified methods contained the true values of the equated results. The confidence intervals were estimated with
The analytical derivations used in computing the bandwidth estimator variance and standard errors, as well as the SEE, were verified numerically using the R package
Simulation Results
Asymptotic Standard Errors (ASE) and Monte Carlo Standard Errors (MCSE) for the Bandwidth Parameters
Absolute Average Differences for the Original Asymptotic Standard Errors of Equating (ASEE) and the Modified Asymptotic Standard Errors of Equating (ASEEmod) to the Monte Carlo Standard Errors of Equating (MCSEE) and Average Coverage of 95% Confidence Intervals Based on the ASEE and the ASEEmod.
Discussion
The kernel method of equating is an equipercentile equating method in which number-correct scores are transformed into percentile rank scores from test form
The present study explored the issue of the additional variability stemming from the bandwidth estimation and its impact on the SEE. Building on the existing methodology of Holland et al. (1989) and von Davier et al. (2004), we derived the asymptotic variance of the bandwidth parameter estimator using the delta method for implicit functions (Benichou & Gail, 1989) and incorporated those derivations to expand the existing formulas for calculating the SEE (Holland et al., 1989). Thus, we have introduced SEEs that account for bandwidth estimation variability. A simulation study with 18 data sets generated for a wide range of sample sizes and test lengths was used to illustrate the results of the modified method as compared to the current method of the SEE calculation (Holland et al., 1989) and the MCSEEs.
The results offered several observations which are valuable to the testing industry. Firstly, the newly introduced SEE were accurate and close to the MCSEE estimates for all sample sizes and test lengths, suggesting that the method is suitable for practical use. Secondly, using the MCSEE as a criterion, the results of the study indicate that the original (Holland et al., 1989) and the modified SEEs produce similar results, suggesting that the bandwidth estimation impact on the SEE is minimal.
The presented results apply directly to any pre-smoothing method, provided that the asymptotic covariance matrix of the score probabilities has been defined for such a method. However, in this study we only utilized IRT as the pre-smoothing method and the results may be different if instead using, for example, log-linear models. However, previous research has indicated that the SEEs are fairly accurate even when not accounting for the bandwidth estimation with the penalty function, and so we do not anticipate that the results will differ substantially when using pre-smoothing with log-linear models instead of IRT models.
The method for accounting for bandwidth estimation that we used in the present study can be generalized to additional kernels and equating designs by modifying the presented results to account for the different expressions of the equating function and score probabilities with such approaches. It is furthermore possible to utilize the delta method for implicit functions with other bandwidth estimation methods provided that those specify a function that is minimized which fulfills the properties required for the implicit function theorem and the delta method. One approach which does not fulfill these requirements is the method based on PEN2, since the function PEN2 is not differentiable and can have multiple local minima.
It is important to note that in this study, we derived the modified asymptotic SEE for two test forms in the setting of the equivalent groups data collection design. It can be the case that the bandwidth estimation influence on the SEE is greater for other data collection and equating designs. It would, therefore, be beneficial for future theoretical and empirical studies to focus on determining the bandwidth estimation impact on the SEE in these additional scenarios.
As a final note, we believe that it is theoretically more sound to use a method which successfully accounts for all sources of variability, however negligible those may be. Introducing the modifications to the formulas for the SEE calculation akin to those explored in this study can improve the accuracy of the standard errors of equating, and consequently, facilitate fairness and comparability of the equated results.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Computation of the Penalty Function
In order to compute Equation 26, two partial derivatives of the PEN1 (
Recalling equations (8)–(13), we define the first partial derivative of PEN1 (
The remaining components needed for computing the first partial derivative of PEN1 (
Using equations (36)–(41), we can compute the second partial derivative of PEN1 (
Recall that
Consider further that
Lastly, we can compute the second partial derivative of PEN1 (
We further define
Given Equation 49 is a lengthy expression, we further simplify the notation such that
Noting the three components in Equation 54,
It remains to calculate
The partial derivatives of the PEN1 (
