Ackerman, T.A., & Smith, P. (1988). A comparison of the information provided by essay, multiple-choice, and free-response writing testsApplied Psychological Measurement, 12, 117-128.
2.
Bennett, R.E. (1991). On the meanings of constructed response (Research Rep. No. RR-91-63)Princeton NJ: Educational Testing Service.
3.
Bennett, R.E., Rock, D.A., Braun, H., Frye, D., Spohrer, J.C., & Soloway, E. (1990). The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items. Applied Psychological Measurement, 14, 151-162.
4.
Birenbaum, M., & Shaw, D.J. (1985). Task specification chart: A key to a better understanding of test resultsJournal of Educational Measurement , 22, 219-230.
5.
Birenbaum, M., & Tatsuoka, K.K. (1987). Open-ended versus multiple-choice response format—it does make a differenceApplied Psychological Measurement, 11 , 385-395.
6.
Bridgeman, B. (1991). Essays and multiple-choice tests as predictors of college freshman GPAResearch in Higher Education , 32, 319-332.
7.
Bridgeman, B., & Lewis, C. (1991)Sex differences in the relationship of advanced placement essay and multiple choice scores to grades in college courses (Research Rep. No. RR-91-48)Princeton NJ: Educational Testing Service.
8.
Brown, J.S., & Burton, R.B. (1978). Diagnostic models for procedural bugs in basic mathematical skillsCognitive Science , 2, 155-192.
9.
Gutvirtz, Y. (1989). Effects of sex, test anxiety and item format on performance on a diagnostic test in mathematics Unpublished M.A. thesis, School of Education, Tel-Aviv University (in Hebrew).
10.
Hotelling, H. (1940). The selection of variables for use in prediction with some comments on the general problem of nuisance parametersAnnals of Mathematical Statistics, 11, 271-283.
11.
Martinez, M. (1991). A comparison of multiple-choice and restricted figural response itemsJournal of Educational Measurement , 28, 131-145.
12.
Martinez, M.E., & Katz, I.R. (1992). Cognitive processing requirements of constructed figural response and multiple-choice items in architecture assessment (Research Rep. No. RR-92-5)Princeton NJ: Educational Testing Service.
13.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.; pp. 13-103). New York: Macmillan.
14.
Oosterhof, A.C., & Coats, P.K. (1984). Comparison of difficulties and reliabilities of quantitative word problems in completion and multiple-choice item formatsApplied Psychological Measurement, 8, 287-294.
15.
Payne, S.J., & Squibb, H.R. (1990). Algebra mal-rules and cognitive accounts of error . Cognitive Science, 14, 445-481.
16.
Sleeman, D., Kelly, A.E., Martinak, R., Ward, R.D., & Moore, J.L. (1989). Studies of diagnosis and remediation with high school algebra students. Cognitive Science , 13, 551-568.
17.
Tatsuoka, K.K. (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 34-38.
18.
Tatsuoka, K.K. (1984). Caution indices based on item response theoryPsychometrika, 49, 95-110.
19.
Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 50, 55-73.
20.
Tatsuoka, K.K. (1990). Toward an integration of item response theory and cognitive analysis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. C. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 543-488). Hillsdale NJ: Erlbaum.
21.
Tatsuoka, K.K. (1991). Boolean algebra applied to determination of universal set of knowledge states (Research Rep. No. ONR-1)Princeton NJ: Educational Testing Service.
22.
Tatsuoka, K.K., & Linn, R.L. (1983). Indices for detecting unusual patterns: Links between two general approaches and potential applicationsApplied Psychological Measurement, 7, 81-96.
23.
Tatsuoka, K.K., & Tatsuoka, M.M. (1987). Bug distribution and pattern classificationPsychometrika, 52, 193-206.
24.
Traub, R.E., & MacRury, K. (1990). Antwort-auswahl vs freie-antwort aufgaben bei lernerfolstests [Multiple choice vs. free-response in the testing of scholastic achievement] In K. Ingenkamp & R. S. Jäger (Eds.), Tests und trends 8: Jahrbuch der pädagogischen diagnostik (pp. 128-159). Weinheim, Germany: Beltz Verlag.
25.
Van den Bergh, H. (1990). On the construct validity of multiple-choice items for reading comprehensionApplied Psychological Measurement , 14, 1-12.
Ward, W.C. (1982). A comparison of free-response and multiple-choice forms of verbal aptitude testsApplied Psychological Measurement , 6, 1-11.
28.
Ward, W.C., Frederiksen, N., & Carlson, S.B. (1980). Construct validity of free-response and machine scorable forms of a testJournal of Educational Measurement , 17, 11-29.
29.
Yamamoto, K. (1991). HYBIL: Hybrid model of IRT and latent classes [Computer program]Princeton NJ: Educational Testing Service.
30.
Zimmerman, D.W., Williams, R.H., & Symons, D.L. (1984). Empirical estimates of the comparative reliability of matching tests and multiple-choice testsJournal of Experimental Education, 52 , 179-182.