Sage Journals: Discover world-class research

Abstract

This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.

Keywords

ACCESS evaluation of test engine language assessment multistage adaptive testing design simulation

Get full access to this article

View all access options for this article.

References

Armstrong

R. D.

Jones

Koppel

Pashley

(2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. http://doi.org/10.1177/0146621604263652

Betz

N. E.

Weiss

D. J.

(1974). Simulation studies of two-stage ability testing (Research Report 74-4). University of Minnesota.

Carlson

(2000). ETS finds flaws in the way online GRE rates some students. Chronicle of Higher Education, 47(8), A47.

Chang

H.-H.

Ying

(2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73, 441–450. http://doi.org/10.1007/s11336-007-9047-7

Chen

Yamamoto

von Davier

(2014). Controlling multistage testing exposure rates in international large-scale assessments. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 391–408). CRC Press/Taylor & Francis.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Academic Press.

Educational Testing Service. (2016). PISA 2018 integrated design. https://www.oecd.org/pisa/pisaproducts/PISA-2018-INTEGRATED-DESIGN.pdf

Han

K. T.

(2018). Conducting simulation studies for computerised adaptive testing using SimulCAT: An instructional piece. Journal of Educational Evaluation for Health Professions, 15(7), 1–13. https://doi.org/10.3352/jeehp.2018.15.7

Han

K. T.

(2020). Framework for developing multistage testing with intersectional routing for short-length tests. Applied Psychological Measurement, 44(2), 87–102. http://doi.org/10.1177/0146621619837226

10.

Min

(2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160–176. http://doi.org/10.1080/15434303.2016.1162793

11.

Hendrickson

(2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. http://doi.org/10.1111/j.1745-3992.2007.00093.x

12.

Huang

H.-T.

Hung

S.-T.

Chao

H.-Y.

Chen

J.-H.

Lin

T.-P.

Shih

C.-L.

(2022). Developing and validating a computerized adaptive testing system for measuring the English proficiency of Taiwanese EFL university students. Language Assessment Quarterly, 19(2), 162–188. http://doi.org/10.1080/15434303.2021.1984490

13.

Jodoin

M. G.

Zenisky

Hambleton

R. K.

(2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203–220. http://doi.org.10.1207/s15324818ame1903_3

14.

Kim

Chung

Dodd

Park

(2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574–588. http://doi.org.10.1177/0013164411428977

15.

Lim

Davey

Wells

(2021). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154–178. http://doi.org.10.1111/jedm.12276

16.

Linacre

J. M.

(1999). Relating Cronbach and Rasch reliabilities. Rasch Measurement Transactions, 13(2), 696. http://www.rasch.org/rmt/rmt132i.htm

17.

Lord

F. M.

(1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233–245. https://doi.org/10.1007/BF02294018

18.

Lord

F. M.

(1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23(2), 157–162. https://doi.org/10.1111/j.1745-3984.1986.tb00241.x

19.

Luecht

R. M.

Brumfield

Breithaupt

(2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189–202. http://doi.org/10.1207/s15324818ame1903_2

20.

Luecht

R. M.

Nungester

R. J.

(1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249. http://doi.org/10.1111/j.1745-3984.1998.tb00537.x

21.

Luo

Kim

(2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243–263. http://doi.org/10.1111/jedm.12174

22.

MacGregor

Yen

(2022). Using multistage testing to enhance measurement of an English language proficiency test. Language Assessment Quarterly, 19(1), 54–75. http://doi.org.10.1080/15434303.2021.1988953

23.

Magis

Yan

von Davier

(2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.

24.

Mizumoto

Sasao

Webb

(2019). Developing and evaluating a computerized adaptive testing version of the Word Part Levels Test. Language Testing, 36(1), 101–123. https://doi.org/10.1177/0265532217725776

25.

Oladele

Ndlovu

(2021). A review of standardised assessment development procedure and algorithms for computer adaptive testing: Applications and relevance for fourth industrial revolution. International Journal of Learning, Teaching and Educational Research, 20(5), 1–17. https://doi.org/10.26803/ijlter.20.5.1

26.

Oranje

Mazzeo

Kulick

(2014). A multistage testing approach to group-score assessments. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 371–390). Chapman and Hall/CRC.

27.

Park

Kim

Chung

Dodd

B. G.

(2014). Enhancing pool utilization in constructing the multistage test using mixed-format tests. Applied Psychological Measurement, 38(4), 268–280. http://doi.org/10.1177/0146621613515545

28.

Patsula

L. N.

Hambleton

R. K.

(1999, April). A comparative study of ability estimation from computer-adaptive testing and multi-stage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, QC, Canada.

29.

Pohl

(2013). Longitudinal multistage testing. Journal of Educational Measurement, 50(4), 447–468. http://doi.org/10.1111/jedm.12028

30.

Reckase

M. D.

(2010). Designing item pools to optimize the functioning of a computerized adaptive test. Psychological Test and Assessment Modeling, 52(2), 127–141.

31.

Robin

Steffen

Liang

(2014). The multistage test implementation of the GRE revised general test. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 325–341). CRC Press.

32.

Sahin

M. G.

(2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 7(2), 191–206. https://doi.org/10.21449/ijate.676947

33.

Sari

H. I.

Raborn

(2018). What information works best? A comparison of routing methods. Applied Psychological Measurement, 42(6), 499–515. https://doi.org/10.1177/0146621617752990

34.

Svetina

Liaw

Y.-L.

Rutkowski

(2019). Routing strategies and optimizing design for multistage testing in international large-scale assessments. Journal of Educational Measurement, 56(1), 192–213. http://doi.org/10.1111/jedm.12206

35.

Thissen

(2000). Reliability and measurement precision. In Wainer

Dorans

Eignor

Flaugher

Green

Mislevy

Steinberg

Thissen

(Eds.), Computerized adaptive testing: A primer (2nd ed., pp. 159–184). Lawrence Erlbaum.

36.

Wentzel

Mills

C. M.

Meara

K. C.

(2014). Transitioning a K-12 assessment from linear to multistage tests. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 355–369). CRC Press.

37.

WIDA Consortium. (2014). 2012 amplification of the English language development standards kindergarten–grade 12. Board of Regents of the University of Wisconsin System. https://wida.wisc.edu/sites/default/files/resource/2012-ELD-Standards.pdf

38.

Xiong

(2018). A hybrid strategy to construct multistage adaptive tests. Applied Psychological Measurement, 42(8), 630–643. http://doi.org.10.1177/0146621618762739

39.

Yamamoto

Shin

H. J.

Khorramdel

(2018). Multistage adaptive testing design in international large-scale assessments. Educational Measurement: Issues and Practice, 37(4), 16–27. http://doi.org/10.1111/emip.12226

40.

Yamamoto

Shin

H. J.

Khorramdel

(2019). Introduction of multistage adaptive testing design in PISA 2018 (OECD Education Working Paper No. 209). OECD Publishing. https://doi.org/10.1787/b9435d4b-en

41.

Yan

Lewis

von Davier

(2014). Overview of computerized multistage tests. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 3–20). CRC Press/Taylor & Francis.

42.

Yang

Reckase

(2020). The optimal item pool design in multistage computerized adaptive tests with the p-optimality method. Educational and Psychological Measurement, 80(5), 955–974. http://doi.org.10.1177/0013164419901292

43.

Zeng

(2016). Making test batteries adaptive by using multistage testing techniques [Unpublished doctoral dissertation, University of Wisconsin–Milwaukee]. https://dc.uwm.edu/cgi/viewcontent.cgi?article=2241&context=etd

44.

Zenisky

A. L.

Hambleton

R. K.

(2014). Multistage test designs: Moving research results into practice. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 21–36). CRC Press.

45.

Zenisky

A. L.

Hambleton

R. K.

Luecht

R. M.

(2010). Multistage testing: Issues, designs, and research. In van der Linden

W. J.

Glas

C. A. W.

(Eds.), Elements of adaptive testing (pp. 355–372). Springer.

46.

Zheng

Chang

H.-H.

(2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104–118. http://doi.org/10.1177/0146621614544519

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

3.92 MB

A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment

Abstract

Keywords

Get full access to this article

References

Supplementary Material