A Note on the Type I Error Rate of the PARSCALE G 2 Statistic for Long Tests

Abstract

The PARSCALE G² statistic is arguably the most popular item fit statistic in operational testing. For long tests, the Type I error rates of the statistic have often been found to be satisfactory. However, the Type I error rates of the statistic have only been studied for sample sizes of up to several thousands. The authors examined the Type I error rates of the PARSCALE G² statistic in a simulation study using sample sizes much larger than those considered in the literature. For any fixed test length, the Type I error rate of the PARSCALE G² statistic is found to increase to 1 as the sample size increases. The findings contradict the claim in the PARSCALE software manual that the PARSCALE G² statistic leads to a large-sample test and also contradict the common belief that the statistic has reasonable Type I error rates for long tests. Thus, this simulation study conveys the important practical message that the use of the PARSCALE G² statistic cannot always be recommended even for long tests. The Type I error rates of the item fit statistics of Orlando and Thissen were found to be close to the nominal level for all simulation conditions considered here.

Keywords

large samples model fit Orlando–Thissen item fit statistics

Get full access to this article

View all access options for this article.

References

Allen

Donoghue

Schoeps

(2001). The NAEP 1998 technical report (NCES 2001-509). Washington DC: National Center for Education Statistics.

American Educational Research Association, Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.

Bjorner

J. B.

Smith

K. J.

Stone

C. A.

Sun

(2007). IRTFIT: A macro for item fit and local dependence tests under IRT models. Lincoln, RI: Quality Metric.

Chon

K. H.

Lee

Dunbar

S. B.

(2010). A comparison of item fit statistics for mixed IRT models. Journal of Educational Measurement, 47, 318-338.

DeMars

C. E.

(2005). Type I error rates for PARSCALE’s fit index. Educational and Psychological Measurement, 65, 42-50.

DeMars

C. E.

(2010). Item response theory. New York, NY: Oxford University Press.

du Toit

(2003). IRT from SSI. Lincolnwood, IL: Scientific Software International.

Glas

C. A. W.

Suárez Falcón

J. C.

(2003). A comparison of item-fit statistics for the three parameter logistic model. Applied Psychological Measurement, 27, 87-106.

Haberman

S. J.

Sinharay

Chon

K. H.

(2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78, 417-440.

10.

Hambleton

R. K.

Han

(2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In Lenderking

W. R.

Revicki

(Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57-78). Washington, DC: Degnon Associates.

11.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.

12.

Muraki

Bock

R. D.

(2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer program]. Chicago, IL: Scientific Software.

13.

National Center for Education Statistics. (2007). NAEP technical documentation. Retrieved from http://nces.ed.gov/nationsreportcard/tdw/analysis/2007/scaling_determination_number_math2007.asp

14.

Orlando

Thissen

(2000). New item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64.

15.

Sinharay

Haberman

S. J.

Jia

(2011). Fit of item response theory models: A survey of data from several operational tests (ETS Research Report No. RR-11-29). Princeton, NJ: ETS.

16.

Stone

C. A.

Zhang

(2003). Assessing goodness-of-fit of IRT models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 40, 331-352.

17.

Yen

W. M.

Fitzpatrick

A. R.

(2006). Item response theory. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 111-153). Westport. CT: American Council on Education and Praeger Publishers.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB