Discussion of David Thissen’s Bad Questions

Abstract

Get full access to this article

View all access options for this article.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Castellano

K. E.

A. D.

(2015). Practical differences among aggregate-level conditional status metrics: From median student growth percentiles to value-added models. Journal of Educational and Behavioral Statistics, 40, 35–68.

Haberman

S. J.

(2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204–229.

Luecht

R. M.

(2012). An Introduction to assessment engineering for automatic item generation. In Gierl

Haladyna

(Eds.), Automatic item generation (pp. 59–101). New York, NY: Taylor-Francis/Routledge.

Luecht

R. M.

(2013). Assessment engineering task model maps, task models and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38.

Maydeu-Olivares

(2015). Evaluating the fit of IRT models. In Riese

S. P.

Revecki

D. A.

(Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 111–127). New York, NY: Taylor & Francis (Routledge).

Roussos

Stout

W. F.

Marden

(1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1–30.

Stevens

S. S.

(1946). On the theory of scales of measurement. Science, 103, 677–680.

Stout

W. F.

(1987). A nonparametric approach to assessing latent trait dimensionality. Psychometrika, 52, 589–617.

10.

Zhang

Stout

W. F.

(1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213–249.