Relationships among IRT item discrimination and item fit indices in criterion-referenced language testing

Abstract

This study investigates relationships among the IRT one-parameter fit statistics, the two-parameter slope parameter and traditional biserial correla tions in terms of the role these indices play in criterion-referenced language test construction. It discusses the assumptions of the two models and how these assumptions can affect criterion-referenced test construction and interpreta tion. The study then specifically examines how the indices interrelate as indices of item discrimination. Examinees in Mexico, Saudi Arabia and Japan were administered one of two forms of a functional test (Form A n = 430, k = 94: Form B n = 400, k = 95). The data were analysed using the two IRT models and the results were compared. The results indicate strong relationships among biserial correlation, two-parameter slope, and one-parameter infit and outfit. These results indicate the need to employ the two-parameter model when con ditions allow, and to take item discrimination and item difficulty indices into account when conditions do not. Further implications for interpreting the strong relationships between the indices are discussed.

Get full access to this article

View all access options for this article.

References

Bachman, L.F. 1989: The development and use of criterion-referenced tests of language ability in language program evaluation. In Johnson, R.K. , editor, The Second Language Curriculum, Cambridge: Cambridge University Press, 242-58.

Birnbaum, A. 1968: Some latent trait models and their use in inferring an examinee's ability. In Lord, F.M. and Novick, M.R. , editors, Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.

Brown, J.D. 1988: Improving ESL placement tests using two perspectives. TESOL Quarterly 23, 65-83.

Dinero, T.E. and Haertel, E. 1977: Applicability of the Rasch model with varying item discriminations . Applied Psychological Measurement. 1, 581-92.

Hambleton, R.K. 1979: Latent trait models and their applications. In Traub, R. , editor, Methodological developments: new directions for testing and measurement (no. 4), San Francisco: Jossey-Bass.

Hambleton, R.K. and Cook, L.L. 1977: Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement 14, 75-96.

Hambleton, R.K. and Swaminathan, H. 1985: Item response Theory: principles and applications. Boston, MA: Kluwer-Nijhoff Publishing.

Henning, G. 1984: Advantages of latent trait measurement in language testing . Language Testing 1, 123-34.

— 1987: A guide to language testing: development, evaluation, research Cambridge, MA: Newbury House Publishers.

10.

— 1988: The influence of test and sample dimensionality on latent trait person ability and item difficulty calibrations. Language Testing 5, 83-99.

11.

Hudson, T. 1989: Mastery decisions in program evaluation. In Johnson, R.K. , editor, The second language curriculum. Cambridge: Cambridge University Press, 259-69.

12.

— 1991: Item discrimination indices in criterion-referenced language testing. Paper presented at the 13th Annual Language Testing Research Colloquium. Princeton, March 21, 1991.

13.

Hudson, T. and Lynch, B.K. 1984: A criterion-referenced approach to ESL achievement testing . Language Testing 1, 171-201.

14.

Hulin, C.L. , Lissak, R.I. and Drasgow, F. 1982: Recovery of two- and three-parameter logistic item characteristic curves: a Monte Carlo study. Applied Psychological Measurement 6, 249-60.

15.

Lord, F.M. 1980: Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers.

16.

— 1983: Small N justifies Rasch Model. In Weiss, D.J. , editor, New horizons in testing: latent trait test theory and computerized adaptive testing. New York : Academic Press. 51-61.

17.

Mislevy, R.J. and Bock, R.D. 1986: PC-Bilog: item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.

18.

— 1990: PC-Bilog 3: item analysis and test scoring with binary logistic models Mooresville, IN: Scientific Software.

19.

Popham, W.J. 1978: Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.

20.

Wright, B.D. and Stone, M.H. 1979: Best test design: Rasch measurement. Chicago, IL.: Mesa Press.

21.

Wright, B.D. and Linacre, J.M. 1984: Microscale manual. Black Rock, Connecticut : Mediax Interactive Technologies.