Sage Journals: Discover world-class research

Abstract

The present study compares vertical scaling results for the Rasch model from BILOGMG and WINSTEPS. The item and ability parameters for the real and simulated mathematics tests were scaled across five grades, second to sixth. The simulated data were based on real data for a series of mathematics tests for Grades 2 to 6. The results from WINSTEPS and BILOG-MG were compared in terms of differences and correlations between estimated item and ability parameters. Generally, WINSTEPS appeared to capture the individual and mean estimates more accurately, and BILOG-MG captured the standard deviations more accurately. However, because of the many possible variations in vertical scaling studies, the generalizability of these specific findings may be limited. More important, the findings illustrate that choice of software, in addition to data collection and scaling method decisions, influences vertical scaling results.

Keywords

vertical scaling large-scale assessment Rasch model IRT software

Get full access to this article

View all access options for this article.

References

Beguin, A. A. ,& Hanson, B. A. (2001). Effect of noncompensatory multidimensionality on separate and concurrent estimation in IRT observed score equating. Measurement and Research Department Reports (2001-2). Arnhem, Maart: Citogroep.

Camilli, G. (1988). Scale shrinkage and estimation of latent distribution parameters. Journal of Educational Statistics, 13, 227-242.

Camilli, G. , Yamamoto, K. , & Wang, M. (1993). Scale shrinkage in vertical equating. Applied Psychological Measurement, 17, 379-388.

Davey, T. , Nering, M. L. ,& Thompson, T. (1997). Realistic simulation of item response data (97-4). ACT Research Reports. Iowa City, IA: ACT.

DeMars, C. (2002). Incomplete data and item parameter estimates under JMLE and MML estimation. Applied Measurement in Education, 15(1), 15-32.

Divgi, D. R. (1986). Does the Rasch model really work for multiple-choice items? Not if you look closely. Journal of Educational Measurement, 23, 283-298.

Hambleton, R. K. , Swaminathan, H. ,& Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

Hanson, B. A. , & Beguin, A. A. (1999, April). Obtaining a common scale for IRT item parameters using separate versus concurrent estimation in the common item nonequivalent groups equating design. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.

Hedges, L. V. , & Vevea, J. L. (1997). A study of equating in NAEP. Washington, DC: American Institutes for Research.

10.

Linacre, J. M. (1999). Understanding Rasch measurement: Estimation methods for Rasch measures. Journal of Outcome Measurement, 3, 382-405.

11.

Linacre, J. M. , & Wright, B. J. (1998). A users guide to BIGSTEPS/WINSTEPS: Rasch-model computer program. Chicago: MESA.

12.

Loyd, B. H. ,& Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

13.

Mislevy, R. J. (1984). Estimating latent distributions Psychometrika, 49, 359-381.

14.

Mislevy, R. J. (1988). Exploiting collateral information in the estimation of item parameters: Final report (RR-88-53-ONR). Princeton, NJ: Educational Testing Service.

15.

Petersen, N. S. , Kolen, M. J. , & Hoover, H. D. (1989). Scaling, norming and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: American Council on Education.

16.

Skaggs, G. , & Lissitz, R. W. (1985). Test equating: Relevant issues and a review of recent research. Review of Educational Research, 56, 495-530.

17.

Williams, V. S. L. , Pommerich, M. ,& Thissen, D. (1998). Acomparison of developmental scales based on Thurstone methods and item response theory. Journal of Educational Measurement, 35, 93-108.

18.

Wu, M. L. , Adams, R. J. , & Wilson, M. R. (1998). ACER ConQuest [Computer software]. Melbourne: Australian Council for Educational Research.

19.

Yen, W. (1986). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 50, 275-291.

20.

Zimowski, M. F. , Muraki, E. , Mislevy, R. J. , & Bock, R. D. (1996). BILOG-MG: Multiplegroup IRT analysis and test maintenance for binary items. Chicago: Scientific Software International.

A Comparison of Winsteps and Bilog-Mg for Vertical Scaling with the Rasch Model

Abstract

Keywords

Get full access to this article

References