An Application of the Three-Parameter IRT Model to Vertical Equating

Abstract

This study examined the effectiveness of the three- parameter IRT model in vertically equating five over lapping levels of a mathematics computation test. One to four test levels were administered within intact classrooms to randomly equivalent groups of third through eighth grade students. Test characteristic curves were derived for each grade/test level combina tion. It was generally found that an examinee would receive a higher ability estimate if the test level ad ministered had been calibrated on less able examinees. Practical implications for "out-of-level" and adaptive testing are discussed.

Get full access to this article

View all access options for this article.

References

Cook, L.L. , & Douglas, J.B. (1982). Analysis of fit and vertical equating with the three-parameter model. Paper presented at the annual meeting of the American Educational Research Association, New York.

Cook, L.L. , & Eignor, D.R. (1982). Score equating and item response theory: Some practical considerations. Paper presented at the annual meeting of the American Educational Research Association, Los Angeles.

Divgi, D.R. (1981). Does the Rasch model really work? Not if you look closely. Paper presented at the annual meeting of the National Council on Measurement in Education, Los Angeles.

Forsyth, R. , Saisangjan, U. , & Gilmer, J. (1981). Some empirical results related to the robustness of the Rasch model. Applied Psychological Measurement, 5, 175-186.

Goulet, L.R. , Linn, R.L. , & Tatsuoka, M.M. (1975). Investigation of methodological problems in educational research—longitudinal methodology (Project No. 4-1114). Urbana-Champaign IL: University of Illinois.

Hambleton, R.K. , & Swaminathan, H. (1984). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff .

Hieronymus, A.N. , Lindquist, E.F. , & Hoover, H.D. (1977). Iowa Tests of Basic Skills. Boston: Houghton Mifflin.

Holmes, S.E. (1982). Unidimensionality and vertical equating with the Rasch model. Journal of Educational Measurement, 19, 139-147.

Kolen, M.J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement , 18, 1-11.

10.

Lord, F.M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 117-138.

11.

Lord, F.M. , & Wingerslcy, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score "equatings." Applied Psychological Measurement, 8, 453-461.

12.

Loyd, B.H. , & Hoover, H.D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

13.

Marco, G.L. , Petersen, N.S. , & Stewart, E.E. (1983). A test of the adequacy of curvilinear score equating models. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 147-177). New York: Academic Press.

14.

McKinley, R.L. , & Mills, C.N. (1985). A comparison of several goodness-of-fit statistics . Applied Psychological Measurement, 9, 49-57.

15.

Patience, W.M. (1981). A comparison of latent trait and equipercentile methods of vertically equating tests. Paper presented at the annual meeting of the National Council on Measurement in Education , Los Angeles.

16.

Petersen, N.S. , Cook, L.L. , & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137-156.

17.

Phillips, S.E. (1983). Comparison of equipercentile and item response theory equating when the scaling test method is applied to a multilevel achievement battery. Applied Psychological Measurement, 7, 267-281.

18.

Skaggs, G. , & Lissitz, R.W. (1986). An exploration of the robustness of four test equating models. Applied Psychological Measurement, 10, 303-317.

19.

Slinde, J.A. , & Linn, R.L. (1977). Vertically equated tests: Fact or phantom? Journal of Educational Measurement, 14, 23-32.

20.

Slinde, J.A. , & Linn, R.L. (1978). An exploration of the adequacy of the Rasch model for the problem of vertical equating. Journal of Educational Measurement, 15, 23-35.

21.

Slinde, J.A. , & Linn, R.L. (1979). A note on vertical equating via the Rasch model for groups of quite different ability and tests of quite different difficulty . Journal of Educational Measurement, 16, 159-165.

22.

Wingersky, M.S. , Barton, M.A. , & Lord, F.M. (1982). LOGIST: A computer program for estimating examinee ability and item characteristic curve parameters (LOGIST 5, version 1). Princeton NJ: Educational Testing Service .

23.

Yen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.