An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Abstract

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for empirically deriving a subtest composed of items that potentially measure a second dimension versus DIMTEST for assessing whether this subtest represents a second dimension were investigated. Conditions explored included proportion of sample used for ATFIND, sample size, test length, interability correlations, test structure, and distribution of item difficulties. Overall, it appears that DIMTEST has Type I error rates near the nominal rate and good power in detecting multidimensionality, although Type I error inflation is observed for larger sample sizes. Results suggest that a 50/50 split maximizes power and keeps the Type I error rate below the nominal level unless the test is short and the sample is large. A 75/25 split controls Type I error better for short tests and large samples.

Keywords

ATFIND DIMTEST item response theory unidimensionality multidimensionality

Get full access to this article

View all access options for this article.

References

Ackerman

T. A.

(1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91. doi:10.1111/j.1745-3984.1992.tb00368.x

Ackerman

T. A.

(1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278. doi:10.1207/s15324818ame0704_1

Ansley

R. A.

Forsyth

T. N.

(1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9, 37-48. doi:10.1177/014662168500900104

Camilli

(1992). A conceptual analysis of differential item functioning in terms of a multidimensional item response model. Applied Psychological Measurement, 16, 129-147. doi:10.1177/014662169201600203

DeMars

C. E.

(2003). Detecting multidimensionality due to curricular differences. Journal of Educational Measurement, 40, 29-51. doi:10.1111/j.1745-3984.2003.tb01095.x

Finch

Habing

(2007). Performance of DIMTEST- and NOHARM-based statistics for testing unidimensionality. Applied Psychological Measurement, 31, 292-307. doi:10.1177/0146621606294490

Finch

Monahan

(2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21, 119-140. doi:10.1080/08957340801926102

Finch

Stage

A. K.

Monahan

(2008). Comparison of factor simplicity indices for dichotomous data: DETECT R, Bentler’s simplicity index, and the loading simplicity index. Applied Measurement in Education, 21, 41-64. doi:10.1080/08957340701796365

Froelich

A. G.

Habing

(2008). Conditional covariance-based subtest selection for DIMTEST. Applied Psychological Measurement, 32, 138-155. doi:10.1177/0146621607300421

10.

Gierl

M. J.

Leighton

J. P.

Tan

(2006). Evaluating DETECT classification accuracy and consistency when data display complex structure. Journal of Educational Measurement, 43 265-289. doi:10.1111/j.1745-3984.2006.00016.x

11.

Hattie

Krakowski

Rogers

Swaminathan

(1996). An assessment of Stout’s index of essential unidimensionality. Applied Psychological Measurement, 20, 1-14. doi:10.1177/014662169602000101

12.

Holland

P. W.

Rosenbaum

(1986). Conditional association and multidimensionality in monotone latent variable models. Annals of Statistics, 14, 1523-1543.

13.

Hsieh

(2010). Assess unidimensionality of computerized reading comprehension and math tests. International Journal of Intelligent Technology and Applied Statistics, 3, 93-105.

14.

Jasper

(2010). Applied dimensionality and test structure assessment with the START-M Mathematics Test. International Journal of Educational and Psychological Assessment, 6, 104-125.

15.

McDonald

R. P.

(1967). Non-linear factor analysis (Psychometric Monograph No. 15). Iowa City: Psychometric Society.

16.

Nandakumar

(1991). Traditional dimensionality versus essential dimensionality. Journal of Educational Measurement, 28, 99-117. doi:10.1111/j.1745-3984.1991.tb00347.x

17.

Nandakumar

(1994). Assessing dimensionality of a set of item responses: Comparison of different approaches. Journal of Educational Measurement, 31, 17-35. doi:10.1111/j.1745-3984.1994.tb00432.x

18.

Nandakumar

Stout

(1993). Refinements of Stout’s procedure for assessing latent trait unidimensionality. Journal of Educational and Behavioral Statistics, 18, 41-68. doi:10.3102/10769986018001041

19.

Nonparametric Dimensionality Assessment Package (DIMPACK Version 1.0). (2006). [Computer software]. Champaign, IL: William Stout Institute for Measurement.

20.

Oshima

T. C.

Miller

M. D.

(1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16, 237-248. doi:10.1177/014662169201600304

21.

Reckase

M. D.

(1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412. doi:10.1177/014662168500900409

22.

Reckase

M. D.

(1997). A linear logistic multidimensional model for dichotomous item response data. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 271-286). New York, NY: Springer.

23.

Reckase

M. D.

Ackerman

T. A.

Carlson

J. E.

(1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25, 193-203. doi:10.1111/j.1745-3984.1988.tb00302.x

24.

Roussos

Ozbek

(2006). Formulation of the DETECT population parameter and evaluation of DETECT estimator bias. Journal of Educational Measurement, 43, 215-243. doi:10.1111/j.1745-3984.2006.00014.x

25.

Roussos

Stout

Marden

(1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1-30. doi:10.1111/j.1745-3984.1998.tb00525.x

26.

Seraphine

A. E.

(2000). The performance of DIMTEST when latent trait and item difficulty distributions differ. Applied Psychological Measurement, 24, 82-94. doi:10.1177/01466216000241005

27.

Stout

(1987). A nonparametric approach to assessing latent trait unidimensionality. Psychometrika, 52, 589-617. doi:10.1007/BF02294821

28.

Stout

Froelich

A. G.

Gao

(2001). Using resampling methods to produce an improved DIMTEST procedure. In Boomsma

Duijn

M. A. J.

Snijders

T. A. B.

(Eds.), Essays on item response theory (pp. 357-376). New York, NY: Spring-Verlag.

29.

Stout

Habing

Douglas

Kim

Roussos

Zhang

(1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354. doi:10.1177/014662169602000403

30.

Tourän

Lizasoain

Joaristi

(2012). Assessing the unidimensionality of the School and College Ability Test (SCAT, Spanish version) using nonparametric methods based on item response theory. High Ability Studies, 23, 183-202. doi:10.1080/13598139.2012.735401

31.

Walker

C. M.

Azen

Schmitt

(2006). Statistical versus substantive dimensionality: The effect of distributional differences on dimensionality assessment using DIMTEST. Educational and Psychological Measurement, 66, 721-738. doi:10.1177/0013164405285907

32.

Zhang

Stout

(1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213-249. doi:10.1007/BF02294536