Marginal True-Score Measures and Reliability for Binary Items as a Function of Their IRT Parameters

Abstract

This article provides analytic evaluations of population true-score measures for binary items given their item response theory (IRT) calibration. Under the assumption of normal trait distribution, the expected values of marginalized true scores, error variance, true-score variance, and reliability for norm-referenced and criterion-referenced interpretations are presented as a function of the item parameters. The proposed formulas have methodological and computational value in bridging concepts of IRT and true-score theory. They provide information about the individual contribution of IRT calibrated items to marginal true-score measures and may have valuable applications in test development and analysis. For example, given a bank of IRT calibrated items, one can select binary items to develop a test with known true-score characteristics prior to administering the test (without information about raw scores or trait scores). Calculations with the proposed formulas are easy to perform using basic statistical programs, spreadsheet programs, or even handheld calculators.

Get full access to this article

View all access options for this article.

References

Allen, J. M., & Yen, W. M. (1979). Introduction to measurement theory. Paci.c Grove, CA: Brooks/Cole.

Assessment System Corporation. (1995a). User’s manual for RASCAL Rasch analysis program (Windows version 3.5). St. Paul, MN: Author.

Assessment System Corporation (1995b). User’s manual for XCALIBRE marginal maximumlikelihood estimation program (Windows version 1.0). St. Paul, MN: Author.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental scores. Reading, MA: Addison-Wesley.

Bock, R. D., & Lieberman, M. (1970). Fitting a response model for ndichotomously scored items. Psychometrika, 35, 179-197.

Brennan, R. L. (1983). Elements of generalizability theory. Iowa City, IA: American College Testing Program.

Brennan, R. L., & Kane, M. T. (1977). An index of dependability for mastery tests. Journal of Educational Measurement, 14, 277-289.

Cronbach, L. J. (1951). Coef.cient alpha and the internal structure of a test. Psychometrika, 16, 297-334.

Dimitrov, D. M. (1996, April). Monte Carlo approach for reliability estimations in generalizability studies. Paper presented at the annual meeting of the American Educational Research Association, New York.

10.

Dimitrov, D. M. (2003). Reliability and true-score measures of binary items as a function of their Rasch difculty parameter. Journal of Applied Measurement, 4(3), 222-233.

11.

Feldt, L. S., & Brennan, R. L. (1993). Reliability. In R. L. Linn (Ed.), Educational measurement(3rd ed., pp. 105-146). Phoenix, AZ: American Council on Education and the Oryx Press.

12.

Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347-360.

13.

Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

14.

Hastings, C., Jr. (1955). Approximations for digital computers. Princeton, NJ: Princeton University Press.

15.

Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109-133.

16.

Komaroff, E. (1997). Effect of simultaneous violations of essential tau-equivalent and uncorrelated errors on coefcient alpha. Applied Psychological Measurement, 21, 337-348.

17.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

18.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

19.

MathWorks, Inc. (1999). Learning MATLAB (Version 5.3). Natick, MA: Author.

20.

May, K., & Nicewander, W. A. (1993). Reliability and information functions for percentile ranks. Psychometrika, 58, 313-325.

21.

National Center for Educational Statistics. (1996). National educational longitudinal study: 1988-94. Data .les and electronic codebook system: Base year through third follow-up ECB/CD-ROM.Washington, DC: Of.ce of Educational Research and Improvement, U.S. Department of Education.

22.

Novick, M. R., & Lewis, C. (1967). Coefcient alpha and the reliability of composite measurements. Psychometrika, 32, 1-13.

23.

Rasch, G. (1960). Probabilistic models for intelligence and attainment tests. Copenhagen: Danmarks Paedagogiske Institut.

24.

Raykov, T. (2001). Bias of coefcient alpha for .xed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 69-76.

25.

Riverside Publishing. (1997). Ohio Off-Grade Pro.ciency Tests: Specically designed to measure Ohio’s model course of study. Chicago: Author.

26.

SAS Institute. (1985). SAS user’s guide: Version 5 edition. Cary, NC: Author.

27.

SPSS. (1998). SigmaPlot 5.0 user’s Guide. Chicago: Author.

28.

SPSS. (2002). SPSS Base 11.0 user’s guide. Chicago: Author.

29.

Thissen, D. (1990). Reliability and measurement precision. In H. Wainer (Ed.), Computerized adaptive testing: A primer(pp. 161-186). Hillsdale, NJ: Lawrence Erlbaum.