On the Speed Sensitivity Parameter in the Lognormal Model for Response Times and Implications for High-Stakes Measurement Practice

Abstract

In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.

Keywords

test assembly speededness item response theory high-stakes assessment

Get full access to this article

View all access options for this article.

References

Bertling

Weeks

J. P.

(2018). Using response time data to reduce testing time in cognitive tests. Psychological Assessment, 30(3), 328–338.

Bolsinova

Tijmstra

(2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71, 13–38.

Bridgman

Cline

Hessinger

(2004). Effect of extra time on verbal and quantitative GRE scores. Applied Measurement in Education, 17(1), 25–37.

Bridgman

Trapani

Curley

(2004). Impact of fewer questions per section on SAT I scores. Journal of Educational Measurement, 41(4), 291–310.

Brown

T. A.

(2006). Confirmatory factor analysis for applied research. Guilford Press.

College Board. (2016). Test specifications for the redesigned SAT ( Board

, Ed.).

Debelak

Gittler

Arendasy

(2014). On gender differences in mental rotation processing speed. Learning and Individual Differences, 29, 8–17.

Educational Testing Service. (2020). Test framework and test development, volume 1 ( Service

E. T.

, Ed.; TOEFL iBT).

Finch

(2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.

10.

Fox

J.-P.

(2010). Bayesian item response modeling: Theory and applications. Springer.

11.

Fox

J.-P.

Marianti

(2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research, 51(4), 540–553.

12.

Fox

J.-P.

Marianti

(2017). Person-fit statistics for joint models for accuracy and speed. Journal of Educational Measurement, 54(2), 243–262.

13.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–511.

14.

Gelman

Shirley

(2011). Inference from simulations and monitoring convergence. In Brooks

Gelman

Jones

G. L.

Meng

X. L.

(Eds.), Handbook of Markov Chain Monte Carlo (pp. 163–174). Chapman & Hall/CRC.

15.

Goldhammer

(2015). Measuring ability, speed, or both? Challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13, 133–164.

16.

Goldhammer

Klein Entink

R. H.

(2011). Speed of reasoning and its relation to reasoning ability. Intelligence, 39, 108–119.

17.

Gonzalez

Rutkowski

(2010). Principles of multiple matrix booklet design and parameter recovery in large-scale assessments. In von Davier

Hastedt

(Eds.), IERI monograph series: Issues and methodologies in large-scale assessments (Vol. 3, pp. 125–156). IEA-ETS Research Institute.

18.

Harik

Clauser

B. E.

Grabovsky

Baldwin

Margolis

M. J.

Bucak

Jodoin

Walsh

Haist

(2018). A comparison of experimental and observational approaches to assessing the effects of time constraints in a medical licensing examination. Journal of Educational Measurement, 55(2), 308–327.

19.

Klein Entink

R. H.

Fox

J.-P.

van der Linden

W. J.

(2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48.

20.

Klein Entink

R. H.

van der Linden

W. J.

Fox

J-. P.

(2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62, 621–640.

21.

Lord

F. M.

Novick

M. R.

(1986). Statistical theories of mental test scores. Information Age Publishing.

22.

Lovett

B. J.

(2010). Extended time testing accommodations for students with disabilities: Answers to five fundamental questions. Review of Educational Research, 80(4), 611–638.

23.

Luecht

R. M.

Sireci

S. G.

(2011). A review of models for computer-based testing (College Board, Ed.; Research Report 2011–12). College Board.

24.

Molenaar

Tuerlinckx

van der Maas

H. L. J.

(2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50(1), 56–74.

25.

Organisation for Economic Co-operation and Development. (2016). PISA 2015 technical report (PISA 2015 technical report). OECD Publishing.

26.

Plummer

(2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling.

27.

Plummer

(2016). Rjags: Bayesian graphical models using MCMC (R Package Version 4-6). https://CRAN.R-project.org/package=rjags

28.

Plummer

Best

Cowles

Vines

(2006). Coda: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. https://journal.r-project.org/archive/

29.

Pohl

Ulitzsch

von Davier

(2019). Using response times to model not-reached items due to time limits. Psychometrika, 84(3), 892–920.

30.

Ranger

Ortner

(2012). A latent trait model for response times on tests employing the proportional hazard model. British Journal of Mathematical and Statistical Psychology, 65, 334–349.

31.

Robitzsch

Kiefer

(2017). TAM: Test Analysis Modules (R Package Version 2.8-21). https://CRAN.R-project.org/package=TAM

32.

Samejima

(1977). Weakly parallel tests in latent trait theory with some criticisms of classical test theory. Psychometrika, 42(2), 193–198.

33.

Scherer

Greiff

Hautamäki

(2015). Exploring the relation between time on task and ability in complex problem solving. Intelligence, 48, 37–50.

34.

Spiegelhalter

D. J.

Best

N. G.

Carlin

B. P.

van der Linde

(2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4), 583–639.

35.

Tijmstra

Bolsinova

(2018). On the importance of the speed-ability trade-off when dealing with not reached items. Frontiers in Psychology, 9, Article 964.

36.

van der Linden

W. J

. (2005). Linear models for optimal test assembly. Springer.

37.

van der Linden

W. J

. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.

38.

van der Linden

W. J

. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308.

39.

van der Linden

W. J

. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3), 247–272.

40.

van der Linden

W. J

. (2011a). Setting time limits on tests. Applied Psychological Measurement, 35(3), 183–199.

41.

van der Linden

W. J

. (2011b). Test design and speededness. Journal of Educational Measurement, 48(1), 44–60.

42.

van der Linden

W. J.

Klein Entink

R. H.

Fox

J.-P

. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347.

43.

van der Linden

W. J.

Xiong

. (2013). Speededness and adaptive testing. Journal of Educational and Behavioral Statistics, 38(4), 418–438.

44.

Warm

(1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.13 MB