Sage Journals: Discover world-class research

Abstract

The purpose of this paper was to examine alterna tive techniques for quantifying the errors associated with the criterion of equating a test to itself. Data for the study came from the national standardization of the 3-R's Achievement Test. The reading and mathe matics subtests were analyzed using random samples from the Grade 4 norming group. Errors for two item response theory (IRT; three-parameter and Rasch) methods and the equipercentile equating method were investigated. A total of 45 error estimates from the sampling distribution were obtained for each combina tion of equating method and content area. Analysis of variance procedures were also used to estimate the av erage error across methods for each content area. In addition, the results of the Phillips (1983a, 1983b) studies were reevaluated using the mean of the sam pling distribution of equating errors for each of the methods from the present study and from the corre sponding ANOVA error estimates. The results of this study suggest that single-replication error estimates may provide misleading assessments of the errors as sociated with equating a test to itself. The analysis of variance mean squares appeared somewhat promising as alternatives to error estimates by replication. Fi nally, the results of this study together with those of the Phillips (1983a) study suggest that the Rasch model may be more reliable than other IRT models for equating, but in some applications it is less valid.

Get full access to this article

View all access options for this article.

References

Beard, J.G. , & Pettie, A.L. (1979). A comparison of linear and Rasch equating methods results for basic skills assessment tests. Paper presented at the American Educational Research Association, San Francisco.

Bell, A. (1979). A comparison of three equating procedures on the certifying examination for primary care physicians assistants. Paper presented at the American Educational Research Association Annual Meeting, San Francisco.

Blommers, P.J. , & Forsyth, R.A. (1977). Elementary statistical methods in psychology and education (2nd ed.). Boston: Houghton-Mifflin.

Cole, N.S. (1982). Grade equivalent scores: To GE or not to GE. AERA Division D Vice Presidential Address, American Educational Research Association Annual Meeting, New York City.

Cole, N.S. , Trent, E.R. , & Wadel, D.C. (1982). The 3-R's technical manual achievement edition (Grades K-12). Chicago IL: The Riverside Publishing Company.

Conover, W.J. (1980). Practical nonparametric statistics. New York: Wiley.

Cook, L.L. , & Eignor, D.R. (1983). An investigation of the feasibility of applying item response theory to equate achievement tests. Paper presented at the American Educational Research Association Annual Meeting , Montreal.

Golub-Smith, M. (1980). The application of Rasch model equating techniques to the problem of interpreting longitudinal performance on minimum competency tests. Paper presented at the American Educational Research Association Annual Meeting, Boston.

Green, S.B. (1983). Identifiability of spurious factors using linear factor analysis with binary items. Applied Psychological Measurement , 7, 137-147.

10.

Guskey, T.R. (1981). Comparison of a Rasch model scale and the Grade Equivalent Scale for the vertical equating of test scores. Applied Psychological Measurement, 5, 187-201.

11.

Holmes, S.E. (1982). Unidimensionality and vertical equating with the Rasch model. Journal of Educational Measurement, 19, 139-147.

12.

Kolen, M.J. (1981). Comparison of traditional and latent trait theory methods for equating tests. Journal of Educational Measurement , 18, 1-11.

13.

Kolen, M.J. , & Whitney, D.R. (1982). Comparison of four procedures for equating the tests of general educational development. Journal of Educational Measurement, 19, 297-308.

14.

Lindquist, E.F. (1956). Design and analysis of experiments in psychology and education. Boston: Houghton-Mifflin .

15.

Lord, F.M. (1975). A survey of equating methods based on item characteristic theory (Research Bulletin 75-13). Princeton NJ: Educational Testing Service.

16.

Lord, F.M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 117-138.

17.

Lord, F.M. (1980). Application of item response theory to practical testing problems. Hillsdale NJ: Lawrence Erlbaum.

18.

Lord, F.M. (1982). Standard error of an equating by item response theory. Applied Psychological Measurement, 5, 463-472.

19.

Loyd, B.H. , & Hoover, H.D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

20.

Marco, G.L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.

21.

Marco, G.L. , Petersen, N.S. , & Stewart, E.E. (1979). Applicability of two logistic models for equating test scores when tests and samples are varied. Paper presented at the American Educational Research Association Annual Meeting , San Francisco.

22.

Petersen, N.S. , Cook, L.L. , & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137-156.

23.

Phillips, S.E. (1983a). Comparison of equipercentile and item response theory equating when the scaling test method is applied to a multilevel achievement battery. Applied Psychological Measurement, 7, 267-281.

24.

Phillips, S.E. (1983b). Logistic achievement test scaling and equating with fixed versus estimated lower asymptotes. Paper presented at the National Council on Measurement in Education Annual Meeting, Montreal.

25.

Phillips, S.E. (1984). Quantifying errors in IRT equating methods. Paper presented at the American Educational Research Association Annual Meeting, New Orleans.

26.

Reckase, M.D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics , 4, 207-230.

27.

Rentz, R.R. , & Bashaw, W.L. (1977). The National Reference Scale for Reading: An application of the Rasch model. Journal of Educational Measurement, 14, 161-180.

28.

Woods, E.M. , & Wiley, D.E. (1978). An application of item characteristic curve equating to item sampling packages for multiform tests. Paper presented at the American Educational Research Association Annual Meeting , Toronto.

29.

Wood, R.L. , Wingersky, M.S. , & Lord, F.M. (1976). LOGIST: A computer program for estimating examinee ability and item characteristic curve parameters (Research Memorandum 76-6) . Princeton NJ: Educational Testing Service.

30.

Wright, B.D. , Mead, R.J. , & Bell, S.R. (1979). BICAL: Calibrating items with the Rasch model (Statistical Laboratory Research Memorandum No. 23B). Chicago: University of Chicago, Department of Education .

31.

Yen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.

Quantifying Equating Errors with Item Response Theory Methods

Abstract

Get full access to this article

References