Is Differential Noneffortful Responding Associated With Type I Error in Measurement Invariance Testing?

Abstract

Low test-taking effort as a validity threat is common when examinees perceive an assessment context to have minimal personal value. Prior research has shown that in such contexts, subgroups may differ in their effort, which raises two concerns when making subgroup mean comparisons. First, it is unclear how differential effort could influence evaluations of scale property equivalence. Second, if attaining full scalar invariance, the degree to which differential effort can bias subgroup mean comparisons is unknown. To address these issues, a simulation study was conducted to examine the influence of differential noneffortful responding (NER) on evaluations of measurement invariance and latent mean comparisons. Results showed that as differential rates of NER grew, increased Type I errors of measurement invariance were observed only at the metric invariance level, while no negative effects were apparent for configural or scalar invariance. When full scalar invariance was correctly attained, differential NER led to bias of mean score comparisons as large as 0.18 standard deviations with a differential NER rate of 7%. These findings suggest that test users should evaluate and document potential differential NER prior to both conducting measurement quality analyses and reporting disaggregated subgroup mean performance.

Keywords

test-taking effort noneffortful responding measurement invariance subgroup comparisons validity

Get full access to this article

View all access options for this article.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing (6th ed.). American Educational Research Association.

Bandalos

D. L.

(2014). Relative performance of categorical diagonally weighted least squares and robust maximum likelihood estimation. Structural Equation Modeling: A Multidisciplinary Journal, 21(1), 102-116. https://doi.org/10.1080/10705511.2014.859510

Bloom

H. S.

Hill

C. J.

Black

A. B.

Lipsey

M. W.

(2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289-328. https://doi.org/10.1080/19345740802400072

Boe

E. E.

May

Boruch

R. F.

(2002). Student task persistence in the Third International Mathematics and Science Study: A major source of achievement differences at the national, classroom, and student levels (Research Report 2002-TIMSS1). Center for Research and Evaluation in Social Policy, University of Pennsylvania.

Borghans

Schils

(2012). The leaning tower of PISA: Decomposing achievement test scores into cognitive and noncognitive components [Unpublished manuscript].

Chen

F. F.

(2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464-504. https://doi.org/10.1080/10705510701301834

Cheung

G. W.

Rensvold

R. B.

(2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233-255. https://doi.org/10.1207/S15328007SEM0902_5

Debeer

Buchholz

Hartig

Janssen

(2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502-523. https://doi.org/10.3102/1076998614558485

DeMars

C. E.

(2007). Changes in rapid-guessing behavior over a series of assessments. Educational Assessment, 12(1), 23-45. https://doi.org/10.1080/10627190709336946

10.

DeMars

C. E.

(2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70(6), 961-972. https://doi.org/10.1177/0013164410366691

11.

DeMars

C. E.

Bashkov

B. M.

Socha

A. B.

(2013). The role of gender in test-taking motivation under low-stakes conditions. Research & Practice in Assessment, 8, 69-82.

12.

DeMars

C. E.

Wise

S. L.

(2010). Can differential rapid-guessing behavior lead to differential item functioning? International Journal of Testing, 10(3), 207-229. https://doi.org/10.1080/15305058.2010.496347

13.

Dimitrov

D. M.

(2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counseling and Development, 43(2), 121-149. https://doi.org/10.1177/0748175610373459

14.

Finch

H. W.

French

B. F.

Hernández Finch

M. E.

(2018). Comparison of methods for factor invariance testing of a 1-factor model with small samples and skewed latent traits. Frontiers in Psychology, 9(332), 1-12. https://doi.org/10.3389/fpsyg.2018.00332

15.

Fischer

Karl

J. A.

(2019). A primer to (cross-cultural) multi-group invariance testing possibilities in R. Frontiers in Psychology, 10, Article 1507. https://doi.org/10.3389/fpsyg.2019.01507

16.

Goldhammer

Martens

Christoph

Lüdtke

(2016). Test-taking engagement in PIAAC. OECD.

17.

L. T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118

18.

Joo

S. H.

Kim

E. S.

(2019). Impact of error structure misspecification when testing measurement invariance and latent-factor mean difference using MIMIC and multiple-group confirmatory factor analysis. Behavior Research Methods, 51(6), 2688-2699. https://doi.org/10.3758/s13428-018-1124-6

19.

Kam

C. C. S.

Meyer

J. P.

(2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18(3), 512-541. https://doi.org/10.1177/1094428115571894

20.

Kamata

Bauer

D. J.

(2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136-153. https://doi.org/10.1080/10705510701758406

21.

Kim

K. H.

(2005). The relation among fit indexes, power, and sample size in structural equation modeling. Structural Equation Modeling, 12(3), 368-390. https://doi.org/10.1207/s15328007sem1203_2

22.

Kline

R. B.

(2005). Principles and practice of structural equation modeling (2nd ed.). Guilford Press.

23.

Kuhfeld

Soland

(2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147-175. https://doi.org/10.1080/19345747.2019.1636437

24.

Liu

Luo

(2019). Modeling test-taking non-effort in MIRT models. Frontiers in Psychology, 10, 145. https://doi.org/10.3389/fpsyg.2019.00145

25.

Millsap

R. E.

(2011). Statistical approaches to measurement invariance. Routledge.

26.

Mittelhaëuser

M. A.

Béguin

A. A.

Sijtsma

(2015). The effect of differential motivation on IRT linking. Journal of Educational Measurement, 52(3), 339-358. https://doi.org/10.1111/jedm.12080

27.

OECD. (2019). PISA 2018 Results (Volume 1): What students know and can do. https://doi.org/10.1787/5f07c754-en

28.

Osborne

J. W.

Blanchard

M. R.

(2011). Random responding from participants is a threat to the validity of social science research results. Frontiers in Psychology, 1, Article 220. https://doi.org/10.3389/fpsyg.2010.00220

29.

Penk

Pöhlmann

Roppelt

(2014). The role of test-taking motivation for students’ performance in low-stakes assessments: An investigation of school-track-specific differences. Large-Scale Assessments in Education, 2(1), 1-17. https://doi.org/10.1186/s40536-014-0005-4

30.

Penk

Schipolowski

(2015). Is it all about value? Bringing back the expectancy component to the assessment of test-taking motivation. Learning and Individual Differences, 42, 27-35. https://doi.org/10.1016/j.lindif.2015.08.002

31.

Putnick

D. L.

Bornstein

M. H.

(2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71-90. https://doi.org/10.1016/j.dr.2016.06.004

32.

R Development Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

33.

Rios

J. A.

(in press). Improving test-taking motivation on low-stakes educational assessments: A meta-analysis of interventions. Applied Measurement in Education.

34.

Rios

J. A.

Guo

(2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential NER on an international college-level assessment of critical thinking. Applied Measurement in Education, 33(4), 263-279. https://doi.org/10.1080/08957347.2020.1789141

35.

Rios

J. A.

Guo

Mao

Liu

O. L.

(2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74-104. https://doi.org/10.1080/15305058.2016.1231193

36.

Rios

J. A.

Soland

(2020). Parameter estimation accuracy of the Effort-Moderated Item Response Theory Model under multiple assumption violations. Educational and Psychological Measurement. Advance online publication. https://doi.org/10.1177/0013164420949896

37.

Rosseel

(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. https://doi.org/10.18637/jss.v048.i02

38.

Rutkowski

Svetina

(2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74(1), 31-57.

39.

Schnipke

D. L.

(1995, April). Assessing speededness in computer-based tests using item response times [Paper presentation]. Annual meeting of the National Council on Measurement in Education, San Francisco, CA, United States.

40.

Setzer

J. C.

Wise

S. L.

van den Heuvel

J. R.

Ling

(2013). An investigation of examinee test-taking effort on a large-scale assessment. Applied Measurement in Education, 26(1), 34-49. https://doi.org/10.1080/08957347.2013.739453

41.

Smith

J. K.

Given

L. M.

Julien

Ouellette

DeLong

(2013). Information literacy proficiency: Assessing the gap in high school students’ readiness for undergraduate academic work. Library & Information Science Research, 35(2), 88-96. https://doi.org/10.1016/j.lisr.2012.12.001

42.

Soland

(2018). Are achievement gap estimates biased by differential student test effort? Putting an important policy metric to the test. Teachers College Record, 120(12), 1-26.

43.

Soland

Kuhfeld

(2019). Do students rapidly guess repeatedly over time? A longitudinal analysis of student test disengagement, background, and attitudes. Educational Assessment, 24(4), 327-342. https://doi.org/10.1080/10627197.2019.1645592

44.

Stark

Chernyshenko

O. S.

Drasgow

(2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292-1306. https://doi.org/10.1037/0021-9010.91.6.1292

45.

van Barneveld

. (2007). The effect of examinee motivation on test construction within an IRT framework. Applied Psychological Measurement, 31(1), 31-46. https://doi.org/10.1177/0146621606286206

46.

Wise

S. L.

(2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237-252. https://doi.org/10.1080/08957347.2015.1042155

47.

Wise

S. L.

(2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52-61. https://doi.org/10.1111/emip.12165

48.

Wise

S. L.

Cotton

M. R.

(2009). Test-taking effort and score validity: The influence of student conceptions of assessment. In McInerney

D. M.

Brown

G. T. L.

Liem

G. A. D.

(Eds.), Student perspectives on assessment: What students can tell us about assessment for learning (pp. 187-206). Information Age.

49.

Wise

S. L.

DeMars

C. E.

(2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1-18. https://doi.org/10.1207/s15326977ea1001_1

50.

Wise

S. L.

DeMars

C. E.

(2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19-38. https://doi.org/10.1111/j.1745-3984.2006.00002.x

51.

Wise

S. L.

DeMars

C. E.

(2009). A clarification of the effects of rapid guessing on coefficient alpha: A note on Attali’s reliability of speeded number-right multiple-choice tests. Applied Psychological Measurement, 33(6), 488-490. https://doi.org/10.1177/0146621607304655

52.

Wise

S. L.

DeMars

C. E.

(2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15(1), 27-41. https://doi.org/10.1080/10627191003673216

53.

Wise

S. L.

Kingsbury

G. G.

(2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53(1), 86-105. https://doi.org/10.1111/jedm.12102

54.

Wise

S. L.

Kingsbury

G. G.

Thomason

Kong

(2004, April). An investigation of motivation filtering in a statewide achievement testing program [Paper presentation]. Annual meeting of the National Council on Measurement in Education, San Diego, CA, United States.

55.

Wise

S. L.

Kong

(2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163-183. https://doi.org/10.1207/s15324818ame1802_2

56.

Wise

S. L.

Kuhfeld

M. R.

(2020). Using retest data to evaluate and improve effort? moderated scoring. Journal of Educational Measurement. Advance online publication. DOI: https://doi.org/10.1111/jedm.1227510.1111/jedm.12275

57.

Wolf

E. J.

Harrington

K. M.

Clark

S. L.

Miller

M. W.

(2013). Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety. Educational and Psychological Measurement, 73(6), 913-934. https://doi.org/10.1177/0013164413495237

58.

Yoon

Lai

H. C.

(2018). Testing factorial invariance with unbalanced samples. Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 201-213. https://doi.org/10.1080/10705511.2017.1387859

59.

Zamarro

Hitt

Mendez

(2019). When students don’t care: Reexamining international differences in achievement and student effort. Journal of Human Capital, 13(4), 519-552. https://doi.org/10.1086/705799

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB