Using Automatic Item Generation to Simultaneously Construct German and English Versions of a Word Fluency Test

Abstract

There has been an increased interest in psychometric tests that enable the comparison of test scores across different language versions. From a psychometric point of view, this endeavor requires empirical evidence on the full score equivalence of these measures. However, this aim is often rather difficult to achieve. This is particularly true for the assessment of verbal abilities. In the present article, we will outline how automatic item generation can be used to overcome some of the problems researchers face in constructing psychometric tests that are valid in multilingual assessment settings. The feasibility of this approach is illustrated in the context of the construction and empirical evaluation of a German and English word fluency test. The results of the various studies reported in this article indicate that automatic item generation enables the generation of a sufficient amount of word fluency items exhibiting a high psychometric quality in both languages. Furthermore, the item pool constructed for both languages can be linked to each other using a common set of anchor items that are identical to each other with regard to their conceptual, linguistic, and psychometric characteristics, thereby facilitating cross-lingual comparisons of word fluency performance.

Keywords

automatic item generation test adaptation cross-cultural assessment word fluency

Get full access to this article

View all access options for this article.

References

Allalouf

Hambleton

R. K.

Sireci

S. G.

(1999). Identifying the causes of differential item functioning in translated verbal items. Journal of Educational Measurement, 36, 185-198.

Allalouf

Rapp

Stoller

(2009). Which item types are better suited to the linking of verbal adapted tests? International Journal of Testing, 9, 92-107.

Andersen

E. B.

(1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140.

Anderson

J. R.

(1998). Cognitive psychology and its implications. New York: Freeman.

Anderson

J. R.

(2005). Human symbol manipulation within an integrated cognitive architecture. Cognitive Science, 29, 313-342.

Arendasy

(2004). Automatisierte Itemgenerierung und psychometrische Qualitätssicherung am Beispiel des Matrizentests GEOM [Automatic Item Generation and Psychometric Quality Control demonstrated using the figural matrices test GEOM]. Wien, Austria: Habilitationsschrift der Universität Wien.

Arendasy

(2006). VfGen—An item generator for the construction of word fluency items. Vienna, Austria: University of Vienna.

Arendasy

Hergovich

Sommer

(2008). Investigating the ‘g’ saturation of various stratum-two factors using automatic item generation. Intelligence, 36, 574-583.

Arendasy

Sommer

(2005). The effect of different types of perceptual manipulations on the dimensionality of automatically generated figural matrices. Intelligence, 33, 307-324.

10.

Arendasy

Sommer

(2007). Automatic generation of quantitative reasoning items: A schema-based isomorphic approach. Learning and Individual Differences, 17, 366-383.

11.

Arendasy

Sommer

(2011). Automatisierte itemgenerierung: Aktuelle ansätze, anwendungen und forschungsrichtungen [Automatic item generation: Current approaches, applications and future directions]. In Hornke

L. F.

Amelang

Kersting

(Eds.), Enzyklopädie der psychologie (S. 215-280). Göttingen, Germany: Hogrefe.

12.

Baayen

R. H.

Piepenbrock

Gulikers

(1995). The CELEX Lexical Database [CD-ROM]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania.

13.

Borsboom

Mellenbergh

G. J.

van Heerden

(2004). The concept of validity. Psychological Review, 11, 1061-1071.

14.

Burke

(2009, July). Facing the challenge of global assessment and addressing the conceptual and technical challenges of equivalence of measurement for employment tests across languages and geographies. Paper presented at the11th European Congress of Psychology, Oslo, Norway.

15.

Byrne

B. M.

Oakland

Leong

F. T. L.

van de Vijver

F. J. R.

Hambleton

R. K.

Cheung

F. M.

Bartram

(2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3, 94-105.

16.

Carroll

J. B.

(1993). Human cognitive abilities.A survey of factor-analytic studies. Cambridge, UK: Cambridge University Press.

17.

Casillas

Robbins

S. B.

(2005). Test adaptation and cross-cultural assessment from a business perspective: Issues and recommendations. International Journal of Testing, 5, 5-21.

18.

De Bleser

Dupont

Postler

Bormans

Speelman

Mortelmans

Debrock

(2003). The organization of the bilingual lexicon: A PET study. Journal of Neurolinguistics, 16, 439-456.

19.

De Boeck

Wilson

(2004). Explanatory item response models—A generalized linear and nonlinear approach. New York: Springer.

20.

Elosua

López-Jaúregui

(2007). Potential sources of differential item functioning in the adaptation of tests. International Journal of Testing, 7, 39-52.

21.

Embretson

S. E.

(1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93, 179-197.

22.

Embretson

S. E.

(2005). Measuring human intelligence with artificial intelligence. In Sternberg

R. J.

Pretz

J. E.

(Eds.), Cognition and intelligence (pp. 251-267). New York: Cambridge University Press.

23.

Fischer

G. H.

(1995). The linear logistic test model. In Fischer

G. H.

Molenaar

I. W.

(Eds.), Rasch models.Foundations, recent developments, and applications (pp. 157-180). New York: Springer.

24.

Glas

C. A. W.

Verhelst

N. D.

(1995). Tests of fit for polytomous Rasch models. In Fischer

G. H.

Molenaar

I. W.

(Eds.), Rasch models.Foundations, recent developments, and applications (pp. 325-352). New York: Springer.

25.

Greeno

J. G.

Moore

J. L.

Smith

D. R.

(1993). Transfer of situated learning. In Dettermann

D. K.

Sternberg

R. J.

(Eds.), Transfer on trial: Intelligence, cognition, and instruction (pp. 99-167). Westport, CT: Ablex Publishing.

26.

Hager

Hasselhorn

(1994). Handbuch deutschsprachiger wortnormen [Handbook German word norms].Göttingen, Germany: Hogrefe.

27.

Harley

(2008). The psychology of language. Hove, UK: Psychology Press.

28.

Holland

P. W.

Wainer

(1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.

29.

Irvine

S. H.

(2002). The foundations of item generation for mass testing. In Irvine

S. H.

Kyllonnen

P. C.

(Eds.). Item generation for test development (pp. 3-34). Mahwah, NJ: Lawrence Erlbaum Associates.

30.

Irvine

S. H.

Kyllonen

P. C.

(2002). Item generation for test development. Mahwah, NJ: Lawrence Erlbaum.

31.

Jäger

A. O.

(1984). Intelligenzstrukturforschung: Konkurrierende modelle, neue entwicklungen, perspektiven [Research on human intelligence: Competing theoretical models, new developments and perspectives]. Psychologische Rundschau, 35, 21-55.

32.

Kane

M. T.

(2001). Current concerns in validity theory. Journal of Educational Measurement, 38, 319-342.

33.

Karmer

G. A.

Smith

R. M.

(2001). An investigation of gender differences in the components influencing the difficulty of spatial abiliy items. Journal of Applied Measurement, 2, 65-77.

34.

Malda

van de Vijver

F. J. R.

Srinivasan

Transler

Sukumar

Rao

(2008). Adapting a cognitive test for a different culture: An illustration of qualitative procedures. Psychology Science Quarterly, 50, 451-468.

35.

Manly

J. J.

(2008). Critical issues in cultural neuropsychology. Neuropsychology Review, 18, 179-183.

36.

Messick

(1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.

37.

Muñiz

Hambleton

R. K.

Xing

(2001). Small sample studies to detect flaws in test translation. International Journal of Testing, 1, 115-135.

38.

Myors

Schuler

Frintrup

(2008). International perspectives on the legal environment for selection. Industrial and Organisational Psychology, 1, 206-246.

39.

Ponocny

(2003). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437-460.

40.

Poortinga

Y. H.

van de Vijver

F. J. R.

(1987). Explaining cross-cultural differences: Bias analysis and beyond. Journal of Cross-Cultural Psychology, 18, 259-282.

41.

Rasch

(1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.

42.

Schweizer

(2007). Investigating the relationship of working memory tasks and fluid intelligence tests by means of the fixed-links model in considering the impurity problem. Intelligence, 35, 591-604.

43.

Sireci

S. G.

Allalouf

(2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20, 148-166.

44.

Sireci

S. G.

Yang

Harter

Ehrlich

E. J.

(2006). Evaluating guidelines for test adaptations: A methodological analysis of translation quality. Journal of Cross-Cultural Psychology, 37, 557-567.

45.

Stankov

(2000). Complexity, metacognition, and fluid intelligence. Intelligence, 28, 121-143.

46.

Suárez-Falcon

J. C.

Glas

C. A. W.

(2003). Evaluation of global testing procedures for item fit to the Rasch model. British Journal of Mathematical and Statistical Psychology, 56, 127-143.

47.

Tanzer

N. K.

(2005). Developing tests for use in multiple languages and cultures: A plea for simultaneous development. In Hambleton

R. K.

Merenda

P. F.

Spielberger

C. D.

(Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 3-38). Mahwah, NJ: Lawrence Erlbaum Associates.

48.

van de Vijver

F. J. R.

(2002). Inductive reasoning in Zambia, Turkey, and the Netherlands: Establishing cross-cultural equivalence. Intelligence, 30, 313-351.

49.

van de Vijver

F. J. R.

(2008). On the meaning of cross-cultural differences in simple cognitive measures. Educational Research and Evaluation, 14, 215-234.

50.

van de Vijver

F. J. R.

Leung

(1997). Methods and data analysis for cross-cultural research. Newbury Park, CA: SAGE.

51.

van de Vijver

F. J. R.

Poortinga

Y. H.

(1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13, 29-37.

52.

van de Vijver

F. J. R.

Poortinga

Y. H.

(2005). Conceptual and methodological issues in adapting tests. In Hambleton

R. K.

Merenda

P. F.

Spielberger

C. D.

(Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 39-63). Mahwah, NJ: Lawrence Erlbaum Associates.

53.

van de Vijver

F. J. R.

Tanzer

N. K.

(2004). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 54, 119-135.

54.

van Rijn

Anderson

J. R.

(2003). Modeling lexical decision as ordinary retrieval. In Detje

Doerner

Schaub

(Eds.), Proceedings of the Fifth International Conference on Cognitive Modeling(pp. 207-212). Bamberg, Germany: Universitäts-Verlag Bamberg.