Asymptotic Variance of Linking Coefficient Estimators for Polytomous IRT Models

Abstract

In item response theory (IRT), when two groups from different populations take two separate tests, there is a need to link the two ability scales so that the item parameters of the tests are comparable across the groups. To link the two scales, information from common items are utilized to estimate linking coefficients which place the item parameters on the same scale. For polytomous IRT models, the Haebara and Stocking–Lord methods for estimating the linking coefficients have commonly been recommended. However, estimates of the variance for these methods are not available in the literature. In this article, the asymptotic variance of linking coefficients for polytomous IRT models with the Haebara and Stocking–Lord methods are derived. The results are presented in a general form and specific results are given for the generalized partial credit model. Simulations which investigate the accuracy of the derivations under various settings of model complexity and sample size are provided, showing that the derivations are accurate under the conditions considered and that the Haebara and Stocking–Lord methods have superior performance to several moment methods with performance close to that of concurrent calibration.

Keywords

linking coefficients equating coefficients item response theory standard errors nonequivalent groups design

Get full access to this article

View all access options for this article.

References

Andersson

(2016). Asymptotic standard errors of observed-score equating with polytomous IRT models. Journal of Educational Measurement, 53, 459-477.

Baker

F. B.

(1992). Equating tests under the graded response model. Applied Psychological Measurement, 16, 87-96.

Benichou

Gail

M. H.

(1989). A delta method for implicitly defined random variables. The American Statistician, 43, 41-44.

Chalmers

R. P.

(2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.

Haberman

S. J.

Lee

Y.-H.

Qian

(2009). Jackknifing techniques for evaluation of equating accuracy (Research Report No. RR-09-39). Princeton, NJ: Educational Testing Service.

Haebara

(1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149.

Hanson

B. A.

Béguin

A. A.

(2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3-24.

Kim

S. H.

Cohen

A. S.

(1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131-143.

Kim

Kolen

M. J.

(2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371-397.

10.

Kim

Lee

W.-C.

(2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43, 53-76.

11.

Kolen

M. J.

Brennan

R. J.

(2014). Test equating: Methods and practices (3rd ed.). New York, NY: Springer-Verlag.

12.

Lord

F. M.

(1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

13.

Loyd

B. H.

Hoover

(1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

14.

Marco

G. L.

(1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.

15.

Michaelides

M. P.

Haertel

E. H.

(2014). Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Applied Measurement in Education, 27, 46-57.

16.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.

17.

Ogasawara

(2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53-67.

18.

Ogasawara

(2011). Applications of asymptotic expansion in item response theory linking. In von Davier

A. A.

(Ed.), Statistical models for test equating, scaling, and linking (pp. 261-280). New York, NY: Springer.

19.

R Development Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

20.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement No. 17.

21.

Stocking

M. L.

Lord

F. M.

(1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.

22.

Wong

C. C.

(2015). Asymptotic standard errors for item response theory true score equating of polytomous items. Journal of Educational Measurement, 52, 106-120.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.29 MB