Sage Journals: Discover world-class research

Abstract

Multidimensional forced choice (MFC) measures are gaining prominence in noncognitive assessment. Yet there has been little research on detecting differential item functioning (DIF) with models for forced choice measures. This research extended two well-known DIF detection methods to MFC measures. Specifically, the performance of Lord’s chi-square and item parameter replication (IPR) methods with MFC tests based on the Multi-Unidimensional Pairwise Preference (MUPP) model was investigated. The Type I error rate and power of the DIF detection methods were examined in a Monte Carlo simulation that manipulated sample size, impact, DIF source, and DIF magnitude. Both methods showed consistent power and were found to control Type I error well across study conditions, indicating that established approaches to DIF detection work well with the MUPP model. Lord’s chi-square outperformed the IPR method when DIF source was statement discrimination while the opposite was true when DIF source was statement threshold. Also, both methods performed similarly and showed better power when DIF source was statement location, in line with previous research. Study implications and practical recommendations for DIF detection with MFC tests, as well as limitations, are discussed.

Keywords

differential item functioning multidimensional forced choice linking item response theory measurement invariance multi-unidimensional pairwise preference model

Get full access to this article

View all access options for this article.

References

Barrick

M. R.

Mount

M. K.

(1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26. https://doi.org/10.1111/j.1744-6570.1991.tb00688.x

Birnbaum

(1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord

F. M.

Novick

M. R.

(Eds.), Statistical theories of mental test scores (pp. 397–472). Addison-Wesley.

Brooks

S. P.

Gelman

(1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455. https://doi.org/10.1080/10618600.1998.10474787

Brown

Maydeu-Olivares

(2011). Item response modeling of forced choice questionnaires. Educational and Psychological Measurement, 71(3), 460–502. https://doi.org/10.1177/0013164410375112

Candell

G. L.

Drasgow

(1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12(3), 253–260. https://doi.org/10.1177/014662168801200304

Cao

Drasgow

(2019). Does forcing reduce faking? A meta-analytic review of forced choice personality measures in high-stakes situations. The Journal of Applied Psychology, 104(11), 1347–1368. https://doi.org/10.1037/apl0000414

Chalmers

R. P.

(2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Chun

Stark

Kim

E. S.

Chernyshenko

O. S.

(2016). MIMIC methods for detecting DIF among multiple groups: Exploring a new sequential-free baseline procedure. Applied Psychological Measurement, 40(7), 486–499. https://doi.org/10.1177/0146621616659738

Cohen

A. S.

Kim

S. H.

(1993). A comparison of Lord’s χ² and Raju’s area measures in detection of DIF. Applied Psychological Measurement, 17(1), 39–52. https://doi.org/10.1177/014662169301700109

10.

Cohen

A. S.

Kim

S. H.

Baker

F. B.

(1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335–350. https://doi.org/10.1177/014662169301700402

11.

Doornik

J. A.

(2009). An object-oriented matrix language: Ox 6. Timberlake.

12.

du Toit

(Ed.). (2003). IRT from SSI (pp. 34–35). Scientific Software International.

13.

Flowers

C. P.

Oshima

T. C.

Raju

N. S.

(1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23(4), 309–326. https://doi.org/10.1177/01466219922031437

14.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136

15.

Haebara

(1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149. https://doi.org/10.4992/psycholres1954.22.144

16.

Holland

P. W.

Thayer

D. T.

(1988). Differential item functioning and the Mantel-Haenszel procedure. In Wainer

Braun

H. I.

(Eds.), Test validity (pp. 129–145). Erlbaum.

17.

Holland

P. W.

Wainer

(Eds.), (1993). Differential item functioning (1st ed.). Educational Testing Service.

18.

Hulin

C. L.

Drasgow

Parsons

C. K.

(1983). Item response theory: Application to psychological measurement.

19.

Joo

Lee

Stark

(2022). Bayesian approaches for detecting differential item functioning using the Generalized Graded Unfolding Model. Applied Psychological Measurement, 46(2), 98–115. https://doi.org/10.1177/01466216211066606

20.

Joo

S. H.

Lee

Stark

(2017). Evaluating anchor-item designs for concurrent calibration with the GGUM. Applied Psychological Measurement, 41(2), 83–96. https://doi.org/10.1177/0146621616673997

21.

Joo

S. H.

Lee

Stark

(2018). Development of information functions and indices for the GGUM‐RANK multidimensional forced choice IRT model. Journal of Educational Measurement, 55(3), 357–372. https://doi.org/10.1111/jedm.12183

22.

Kim

S. H.

Cohen

A. S.

(1995). A comparison of Lord’s chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8(4), 291–312. https://doi.org/10.1207/s15324818ame0804_2

23.

Langer

(2008). A reexamination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation. (Unpublished doctoral dissertation). University of North Carolina, Chapel Hill.

24.

Lee

Joo

Stark

(2021). Detecting DIF in multidimensional forced choice measures using the Thurstonian item response theory model. Organizational Research Methods, 24(4), 739–771. https://doi.org/10.1177/1094428120959822

25.

Lee

Joo

Stark

Chernyshenko

O. S.

(2019). GGUM-RANK statement and person parameter estimation with multidimensional forced choice triplets. Applied Psychological Measurement, 43(3), 226–240. https://doi.org/10.1177/0146621618768294

26.

Lord

F. M.

(1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.

27.

Morillo

Leenen

Abad

F. J.

Hontangas

de la Torre

Ponsoda

(2016). A dominance variant under the multi-unidimensional pairwise-preference framework: Model formulation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 40(7), 500–516. https://doi.org/10.1177/0146621616662226

28.

Nanda

A. O.

Oshima

Gagne

(2006). DIFCUT: A SAS/IML program for conducting significance tests for differential functioning of items and tests (DFIT). Applied Psychological Measurement, 30(2), 150–151. https://doi.org/10.1177/0146621605280971

29.

Oshima

T. C.

Raju

N. S.

Nanda

A. O.

(2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 1–17. https://doi.org/10.1111/j.1745-3984.2006.00001.x

30.

Qiu

X.-L.

(2020). Assessing differential statement functioning in polytomous multidimensional pairwise comparison items. Journal of Applied Measurement, 21(3), 329–346.

31.

Qiu

X.-L.

Wang

W.-C.

(2021). Assessment of differential statement functioning in ipsative tests with multidimensional forced-choice items. Applied Psychological Measurement, 45(2), 79–94. https://doi.org/10.1177/0146621620965739

32.

Raju

N. S.

(1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197–207. https://doi.org/10.1177/014662169001400208

33.

Raju

N. S.

Drasgow

Slinde

J. A.

(1993). An empirical comparison of the area methods, Lords’ chi-square test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301–314. https://doi.org/10.1177/0013164493053002001

34.

Raju

N. S.

Fortmann-Johnson

K. A.

Kim

Morris

S. B.

Nering

M. L.

Oshima

T. C.

(2009). The item parameter replication method for detecting differential functioning in the polytomous DFIT framework. Applied Psychological Measurement, 33(2), 133–147. https://doi.org/10.1177/0146621608319514

35.

Raju

N. S.

van der Linden

W. J.

Fleer

P. F.

(1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353–368. https://doi.org/10.1177/014662169501900405

36.

Rasch

(1960). Probabilistic models for some intelligence and attainment tests.

37.

Richardson

Abraham

Bond

(2012). Psychological correlates of university students’ academic performance: A systematic review and meta-analysis. Psychological Bulletin, 138(2), 353–387. https://doi.org/10.1037/a0026838

38.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24(1), 3–32. https://doi.org/10.1177/01466216000241001

39.

Roberts

J. S.

Fang

Cui

Wang

(2006). GGUM2004: A windows-based program to estimate parameters in the generalized graded unfolding model. Applied Psychological Measurement, 30(1), 64–65. https://doi.org/10.1177/0146621605280141

40.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34(S1), 1–97. https://doi.org/10.1007/BF03372160

41.

Schmitt

(2012). Development of rationale and measures of noncognitive college student potential. Educational Psychologist, 47(1), 18–29. https://doi.org/10.1080/00461520.2011.610680

42.

Seybert

Stark

(2012). Iterative linking with the differential functioning of items and tests (DFIT) method: Comparison of testwide and item parameter replication (IPR) critical values. Applied Psychological Measurement, 36(6), 494–515. https://doi.org/10.1177/0146621612445182

43.

Seybert

Stark

Chernyshenko

O. S.

(2014). Detecting DIF with ideal point models: A comparison of area and parameter difference methods. Applied Psychological Measurement, 38(2), 151–165. https://doi.org/10.1177/0146621613508306

44.

Stark

Chernyshenko

O. S.

Drasgow

(2005). An IRT approach to constructing and scoring pairwise preference items involving stimuli on different dimensions: The multi- unidimensional pairwise-preference model. Applied Psychological Measurement, 29(3), 184–203. https://doi.org/10.1177/0146621604273988

45.

Stark

Chernyshenko

O. S.

Drasgow

(2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. The Journal of Applied Psychology, 91(6), 1292–1306. https://doi.org/10.1037/0021-9010.91.6.1292

46.

Stark

Chernyshenko

O. S.

Drasgow

Williams

B. A.

(2006). Examining assumptions about item responding in personality assessment: Should ideal point methods be considered for scale development and scoring? The Journal of Applied Psychology, 91(1), 25–39. https://doi.org/10.1037/0021-9010.91.1.25

47.

Thissen

Steinberg

Wainer

(1988). Use of item response theory in the study of group differences in trace lines. In Wainer

Braun

H. I.

(Eds.), Test validity (pp. 147–169). Erlbaum.

48.

Joo

Lee

Stark

(2022). Comparison of parameter estimation approaches for multi-unidimensional pairwise preference tests. Behavior Research Methods, 55(6), 1–23. https://doi.org/10.3758/s13428-022-01927-z

49.

Kumar

L. S.

Joo

Stark

(2024). Linking methods for multidimensional forced choice tests using the Multi-Unidimensional Pairwise Preference model. Applied Psychological Measurement, 48(3), 104–124. https://doi.org/10.1177/01466216241238741

50.

Zhang

Angrave

Sun

(2021). Bmggum: An R package for Bayesian estimation of the multidimensional generalized graded unfolding model with covariates. Applied Psychological Measurement, 45(7-8), 553–555. https://doi.org/10.1177/01466216211040488

51.

Van Iddekinge

C. H.

Putka

D. J.

Campbell

J. P.

(2011). Reconsidering vocational interests for personnel selection: The validity of an interest-based selection test in relation to job knowledge, job performance, and continuance intentions. The Journal of Applied Psychology, 96(1), 13–33. https://doi.org/10.1037/a0021193

52.

Woods

C. M.

Cai

Wang

(2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532–547. https://doi.org/10.1177/0013164412464875

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.12 MB

Detecting DIF with the Multi-Unidimensional Pairwise Preference Model: Lord’s Chi-square and IPR-NCDIF Methods

Abstract

Keywords

Get full access to this article

References

Supplementary Material