Adams, R. J., & Wilson, M. R. (1996). Formulating the Rasch model as a mixed coefficients multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143-166). Norwood, NJ: Ablex.
2.
Adams, R. J., Wilson, M. R., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
3.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
4.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
5.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.
6.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
7.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
8.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.
9.
Congdon, P. (2003). Applied Bayesian modelling. New York: John Wiley.
10.
De Boeck, P., & Wilson, M. R. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.
11.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (Series B), 39, 1-38.
12.
Fischer, G. H. (1973). The linear logistic test model as instrument in educational research. Acta Psychologica, 37, 359-374.
13.
Fischer, G. H., & Parzer, P. (1991). An extension of the rating scale model with an application to the measurement of treatment effects. Psychometrika, 56, 637-651.
14.
Fischer, G. H., & Pononcy, I. (1994). An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59, 177-192.
15.
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409.
16.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, R. D. (2003). Bayesian data analysis (2nd ed.). New York: Chapman & Hall/CRC.
17.
Glas, C. A. W., Wainer, H., & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 271-287). London: Kluwer.
18.
Hoijtink, H., Rooks, G., & Wilmink, F. W. (1999). Confirmatory factor analysis of items with a dichotomous response format using the multidimensional Rasch model. Psychological Methods, 4, 300-314.
19.
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261-277.
20.
Hoskens, M., & De Boeck, P. (2001). Multidimensional componential item response theory models for polytomous items. Applied Psychological Measurement, 25, 19-37.
21.
Hung, L.-F. (2002). The generalized multidimensional multilevel multinomial logit model. Unpublished doctoral dissertation, National Chung Cheng University, Taiwan.
22.
Irvine, S. H., & Kyllonen, P. C. (Eds.). (2002). Item generation for test development. Hillsdale, NJ: Lawrence Erlbaum.
23.
Lee, P. M. (1989). Bayesian statistics: An introduction. New York: Oxford University Press.
24.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: Measurement, Evaluation, Statistics, and Assessment Press.
25.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
26.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman 3 Hall.
27.
McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: John Wiley.
28.
Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29, 133-161.
29.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.
30.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, 370-384.
31.
Punt, A. E., & Hilborn, R. (1997). Fisheries stock assessment and decision analysis: The Bayesian approach. Reviews in Fish Biology and Fisheries, 7, 35-63.
32.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Expanded ed.). Chicago: University of Chicago Press. (Original work published 1960)
33.
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271-285.
34.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185-205.
35.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53, 349-359.
36.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17, 1-100.
37.
SAS Institute. (1999). The NLMIXED procedure [Computer software]. Cary, NC: Author.
38.
Sheehan, K. M., Ginther, A., & Schedl, M. (1999, March). Understanding performance on the TOEFL reading comprehension section: A tree-based regression approach. Paper presented at the annual conference of the American Association of Applied Linguistics, Stamford, CT.
39.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.
40.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247-260.
41.
Tuerlinckx, F., & De Boeck, P. (2001). The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. Psychological Methods, 6, 181-195.
42.
Volodin, N., & Adams, R. J. (1995, April). Identifying and estimating a D-dimensional Rasch model. Paper presented at the International Objective Measurement Workshop, University of California at Berkeley.
43.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.
44.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Test-let response theory: An analog for the 3PL model using in testlet-based adaptive testing. In W. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245-269). London: Kluwer.
45.
Wainer, H., & Kiely, G. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-202.
46.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores?Educational and Psychological Measurement, 57, 749-766.
47.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?Educational Measurement: Issues and Practice, 15(1), 22-29.
48.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
49.
Wang, W.-C. (1999). Direct estimation of correlations among latent traits within IRT framework. Methods of Psychological Research Online, 4, 63-82.
50.
Wang, W.-C., & Chen, H.-C. (2004). The standardized mean difference within the framework of item response theory. Educational and Psychological Measurement, 64, 201-223.
51.
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136.
52.
Wang, W.-C., Cheng, Y.-Y., & Wilson, M. R. (in press). Local item dependence for items across tests connected by common stimuli. Educational and Psychological Measurement.
53.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (1997). Rasch models for multidimensionality between items and within items. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement: Theory into practice (Vol. 4, pp. 139-155). Norwood, NJ: Ablex.
54.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (2000). Interpreting the parameters of a multidimensional Rasch model. In M. Wilson & G. Engelhard (Eds.), Objective measurement: Theory into practice (Vol. 5, pp. 219-242). Norwood, NJ: Ablex.
55.
Wang, W.-C., & Wu, C.-I. (2004). Gain score in item response theory as an effect size measure. Educational and Psychological Measurement, 64, 758-780.
56.
Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26, 109-128.
57.
Wilson, M. R. (1992). The partial order model: An extension of the partial credit model. Applied Psychological Measurement, 16, 309-325.
58.
Wilson, M. R., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60, 181-198.
59.
Wolfinger, R. D., & SAS Institute. (n.d.). Fitting nonlinear mixed models with the new NLMIXED procedure. Retrieved August 17, 2003, from http://support.ssas.com/rnd/app/papers/nlmixedsugi.pdf
60.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ConQuest: Generalized item response modeling software [Computer software and manual]. Camberwell, Victoria: Australian Council for Educational Research.
61.
Yen, W. (1993). Scaling performance assessment: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.