AttaliY. (2013). Validity and reliability of automated essay scoring. In ShermisM. D.BursteinJ. (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181–198). New York, NY: Routledge.
6.
AttaliM.Cayton-HodgesG. (2014). Expanding the CBAL mathematics assessment to elementary grades: The development of a competency model and a rational number learning progression (Research Report No. 14-08). Princeton, NJ: Educational Testing Service.
BennettR. E. (2010a). Cognitively based assessment of, for, and as learning: A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70–91.
11.
BennettR. E. (2010b). Technology for large-scale assessment. In PetersonP.BakerE.McGawB. (Eds.), International encyclopedia of education (3rd ed., Vol. 8, pp. 48–55). Oxford, England: Elsevier.
BennettR. E.BejarI. I. (1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement: Issues and Practice, 17(4), 9–17.
17.
BennettR. E.BraswellJ.OranjeA.SandeneBKaplanB.YanF. (2008). Does it matter if I take my mathematics test on computer? A second empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 6(9). Retrieved from http://files.eric.ed.gov/fulltext/EJ838621.pdf
18.
BennettR. E.GitomerD. H. (2009). Transforming K-12 assessment: Integrating accountability testing, formative assessment, and professional support. In Wyatt-SmithC.CummingJ. (Eds.), Educational assessment in the 21st century (pp. 43–61). New York, NY: Springer.
19.
BennettR. E.KaneM. T.BridgemanB. (2011). Theory of action and validity argument in the context of through-course summative assessment. Princeton, NJ: Educational Testing Service.
20.
BennettR. E.PerskyH.WeissA. R.JenkinsF. (2007). Problem solving in technology-rich environments: A report from the NAEP Technology-Based Assessment Project (NCES 2007-466). Washington, DC: National Center for Education Statistics, US Department of Education. Retrieved from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2007466
BennettR. E.ZhangM. (in press). Validity and automated scoring. In DrasgowF. (Ed.), Technology in testing: Improving educational and psychological measurement. Washington, DC: National Council on Measurement in Education.
23.
BlackP.WiliamD. (1998a). Assessment and classroom learning. Assessment in Education, 5, 7–74.
24.
BlackP.WiliamD. (1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80, 139–148. Retrieved from http://www.pdkintl.org/kappan/kbla9810.htm
25.
BridgemanB. (2013). Human ratings and automated essay evaluation. In ShermisM. D.BursteinJ. (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 221–232). New York, NY: Routledge.
26.
BrookhartS. M. (2010). Formative assessment strategies for every classroom: An ASCD action tool. Alexandria, VA: ASCD.
27.
BundersonC. V.InouyeD. K.OlsenJ. B. (1989). The four generations of computerized testing. In L. LinnR. (Ed.), Educational measurement (3rd ed., pp. 367–407). New York, NY: Macmillan.
28.
ButlerA. C. (2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1118–1133.
CoffeyJ. E.HammerD.LevinD., M.GrantT. (2011). The missing disciplinary substance of formative assessment. Journal of Research in Science Teaching, 48, 1109–1136.
34.
Common Core State Standards Initiative. (2010). Common Core State Standards for English Language Arts and Literacy in History/Social Studies, Science, Science, and Technical Subjects. Retrieved from http://www.corestandards.org/ELA-Literacy/
35.
ConfreyJ.MaloneyA.NguyenK.MojicaG.MyersM. (2009). Equipartitioning/splitting as a foundation of rational number reasoning. In TzekakiM.KaldrimidouM.SakonidisC. (Eds.), Proceedings of the 33rd Conference of the International Group for the Psychology of Mathematics Education (Vol. 1, pp. 345–352). Thessaloniki, Greece: PME.
36.
CorcoranT.MosherF. A.RogatA. (2009). Learning progressions in science: An evidence-based approach to reform. New York: Consortium for Policy Research in Education (CPRE).
37.
DaroP.MosherF. A.CorcoranT. (2011). Learning trajectories in mathematics: A foundation for standards, curriculum, assessment, and instruction (Research Report No. 68). Philadelphia, PA: CPRE.
38.
DeaneP. (2012). Rethinking K-12 writing assessment. In ElliotN.PerelmanL. (Eds.), Writing assessment in the 21st century (pp. 87–100). New York, NY: Hampton Press.
39.
DeaneP.FowlesM.BaldwinD.PerskyH. (2011). The CBAL summative writing assessment: A draft eighth-grade design (ETS Research Memorandum No. 11-01). Princeton, NJ: Educational Testing Service.
40.
DeaneP.SabatiniJ.FowlesM. (2012). Rethinking k-12 writing assessment to support best instructional practices. In BazermanC.DeanC.EarlyJ.LunsfordK.NullS.RogersP.StansellA. (Eds.), International advances in writing research: Cultures, places, measures (pp. 83–102). Anderson, SC: Parlor Press.
DrasgowF.LuechtR. M.BennettR. E. (2006). Technology and testing. In BrennanR. L. (Ed.), Educational measurement (4th ed., pp. 471–515). Westport, CT: American Council on Education/Praeger.
43.
DursoR.Golub-SmithM. L.MillsC. N.SchaefferG. A.SteffenM. (1995). The introduction and comparability of the computer adaptive GRE General Test (Research Report No. 95-20). Princeton, NJ: Educational Testing Service.
44.
EricssonK. A.KrampeR. T.Tesch-RomerC. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100, 363–406.
GrafE. A. (2009). Defining mathematics competency in the service of Cognitively Based Assessment for grades 6 through 8 (ETS Research Memorandum No. RM-09-42). Princeton, NJ: Educational Testing Service.
53.
GrafE. A.HarrisK.MarquezE.FifeJ.RedmanM. (2009). Cognitively based assessment of, for, and as learning (CBAL) in mathematics: A design and first steps toward implementation (ETS Research Memorandum No. RM-09-07). Princeton, NJ: Educational Testing Service.
HerringtonA.MoranC. (2012). Writing to a machine is not writing at all. In ElliotN.PerelmanL. (Eds.), Writing assessment in the 21st century: Essays in honor of Edward M. White (pp. 219–232). New York, NY: Hampton Press.
59.
HinzeS. R.WileyJ.PellegrinoJ. W. (2013). The importance of constructive comprehension processes in learning from tests. Journal of Memory and Language, 69, 151–164.
60.
HollandP. W.DoransN. J. (2006). Linking and equating. In BrennanR. L. (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education/Praeger.
61.
HorkayN.BennettR. E.AllenN.KaplanBYanF. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 5(2). Retrieved from http://files.eric.ed.gov/fulltext/EJ843858.pdf
62.
IrvineS. H.KyllonenP. C. (2010). Item generation for test development. New York, NY: Routledge.
KaneM. T. (2006). Validation. In BrennanR. L. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.
65.
KapurM (2010). Productive failure in mathematical problem solving. Instructional Science, 38, 523–550.
66.
KingsburyG. G.HouserR. L. (1999). Developing computerized adaptive tests for school children. In DrasgowF.Olson-BuchananJ. B. (Eds.), Innovations in computerized assessment (pp. 93–115). Mahwah, NJ: Erlbaum.
67.
KingstonN. M. (2008). Comparability of computer- and paper-administered multiple-choice tests for k–12 populations: A synthesis. Applied Measurement in Education, 22, 22–37. doi:10.1080/08957340802558326
68.
KoprivaR. J. (2009). Assessing the skills and abilities in math and science of ELLs with low English proficiency: A promising new method. AccELLerate!, 2, 7–10. Retrieved from http://www.ncela.us/files/uploads/17/Accellerate_2_1.pdf
69.
KoretzD.HamiltonL. S. (2006). Testing for accountability in k-12. In BrennanR. L. (Ed.), Educational measurement (4th ed., pp. 531–578). Westport, CT: American Council on Education/Praeger.
70.
LaveJ. (1991). Situating learning in communities of practice. In ResnickL. B.LevineJ. M.TeasleyS. D. (Eds.), Perspectives on socially shared cognition (pp. 63–82). Washington, DC: American Psychological Association. doi:10.1037/10096-003
71.
LemannN. (1999). The big test: The secret history of the American meritocracy. New York, NY: Farrar, Strauss, & Giroux.
72.
LinnR. L.BurtonE. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5–8.
73.
LiuL.RogatA.BertlingM. (2013). A CBAL science model of cognition: Developing a competency model and learning progressions to support assessment development (Research Report No.13-29). Princeton, NJ: Educational Testing Service.
74.
LuechtR. M. (2009). Adaptive computer-based tasks under an assessment engineering paradigm. In WeissD. J. (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http://publicdocs.iacat.org/cat2010/cat09luecht.pdf
MislevyR. J.AlmondR. G.LukasJ. F. (2003). A brief introduction to evidence-centered design (Research Report No. 03-16). Princeton, NJ: Educational Testing Service.
78.
MislevyR. J.BehrensJ. T.BennettR. E.DemarkS. F.FrezzoD. C.LevyR.. . . ShuteV. J. (2010). On the roles of external knowledge representations in assessment design. Journal of Technology, Learning, and Assessment, 8(2). Retrieved from http://files.eric.ed.gov/fulltext/EJ873671.pdf
79.
MislevyR. J.BehrensJ. T.DiCerboK. E.LevyR. (2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of Educational Data Mining, 4(1). Retrieved from http://www.educationaldatamining.org/JEDM/index.php/JEDM/article/view/22/12
80.
MislevyR. J.ZwickR. (2012). Scaling, linking, and reporting in a periodic assessment system. Journal of Educational Measurement, 49, 148–166.
National Center for Education Statistics. (2012). The nation’s report card: Science in action: Hands-on and interactive computer tasks from the 2009 science assessment (NCES 2012-468). Washington, DC: Institute of Education Sciences, U.S. Department of Education.
Partnership for Assessment of Readiness for College and Careers. (2010). The Partnership for Assessment of Readiness for College and Careers (PARCC) application for the Race to the Top comprehensive assessment systems competition. Washington, DC: Author. Retrieved from http://www.parcconline.org/sites/parcc/files/PARCC%20Application%20-%20FINAL.pdf
PellegrinoJ. W.ChudowskyN.GlaserR. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academies Press.
97.
PerelmanL. (2014). When “the state of the art” is counting words. Assessing Writing, 21, 104–111.
RamineniC.TrapaniC. S.WilliamsonD. M.DaveyT.BridgemanB. (2012). Evaluation of e-rater for the GRE issue and argument prompts (Research Report No. 12-02). Princeton, NJ: Educational Testing Service.
103.
RavitchD. (2013). The reign of error: The hoax of the privatization movement and the danger to America’s public schools. New York, NY: Knopf.
ReidenbergJ.RussellN. C.KovnotJ.NortonT. B.CloutierR.AlvaradoD. (2013). Privacy and cloud computing in public schools. New York, NY: Fordham Center on Law and Information Policy. Retrieved from http://ir.lawnet.fordham.edu/clip/2/
106.
RitterS.AndersonJ. R.KoedingerK. R.CorbettA. (2007). Cognitive tutor: Applied research in mathematics education. Psychonomic Bulletin & Review, 14, 249–255.
RohrerD.PashlerH. (2010). Recent research on human learning challenges conventional instructional strategies. Educational Researcher, 39, 406–412.
109.
RudnerL. M. (2010). Implementing the Graduate Management Admission Test computerized adaptive test. In van der LindenW. J.GlasC. A. W. (Eds.), Elements of adaptive testing (pp. 151–165). New York, NY: Springer. doi:10.1007/978-0-387-85461-8_8
SchmidtW. H.HouangR. T. (2012). Curricular coherence and the Common Core State Standards for Mathematics. Educational Researcher, 41, 294–308. doi:10.3102/0013189X12464517
112.
ShafferD. W. (2006). How computer games help children learn. New York, NY: Palgrave MacMillan.
113.
ShavelsonR. J.BaxterG. P.GaoX. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30, 215–232.
114.
ShepardL. A. (2008). Formative assessment: Caveat emptor. In DwyerC. A. (Ed.), The future of assessment: Shaping teaching and learning (pp. 279–303). New York, NY: Erlbaum.
ShermisM. D.BursteinJ. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. New York, NY: Routledge.
117.
ShermisM. D.HamnerB. (2013). Contrasting state-of-the-art automated scoring of essays. In ShermisM. D.BursteinJ. (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 313–346). New York, NY: Routledge.
118.
SigelI. (1993). The centrality of a distancing model for the development of representation competence. In CockingR.RenningerK. A. (Eds.), The development and meaning of psychological distance (pp. 141–158). Mahwah, NJ: Erlbaum.
SongY.DeaneP.GrafE. A.van RijnP. (2013). Using argumentation learning progressions to support teaching and assessments of English language arts. R&D Connections, 22, 1–14. Retrieved from http://www.ets.org/Media/Research/pdf/RD_Connections_22.pdf
Steenbergen-HuS.CooperH. (2013). A meta-analysis of the effectiveness of intelligent tutoring systems on K-12 students’ mathematical learning. Journal of Educational Psychology, 105, 980–987. doi:10.1037/a0032447
129.
Steenbergen-HuS.CooperH. (2014). A meta-analysis of the effectiveness of intelligent tutoring systems on college students’ academic learning. Journal of Educational Psychology, 106, 331–347. doi:10.1037/a0034752
130.
StoneE.DaveyT. (2011). Computer-adaptive testing for students with disabilities: A review of the literature (Research Report No. 11-32). Princeton, NJ: Educational Testing Service.
U.S. Department of Education. (2010). Race to the Top assessment program application for new grants: Comprehensive Assessment Systems (CFDA No. 84.395B). Washington, DC: Author.
136.
U.S. Department of Education, Office of Educational Research and Improvement. (1994). What do student grades mean? Differences across schools (Office of Research Report OR 94-3401). Washington, DC: Office of Research. Retrieved from http://files.eric.ed.gov/fulltext/ED367666.pdf
137.
U.S. Medical Licensing Examination. (2012). 2013 USMLE bulletin. Retrieved from http://www.usmle.org/
138.
VanLehnK.van de SandeB. (2009). Acquiring conceptual expertise from modeling: The case of elementary physics. In EricssonK. A. (Ed.), Development of professional expertise: Toward measurement of expert performance and design of optimal learning environments (pp. 356–378). Cambridge, England: Cambridge University Press.
WainerH. (Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Erlbaum.
141.
WangS.JiaoH.YoungM. J.BrooksT.OlsonJ. (2007a). Comparability of computer-based and paper-and-pencil testing in K−12 reading assessments: A meta-analysis of testing mode effects. Educational and Psychological Measurement, 68, 5–24. doi:10.1177/0013164407305592
142.
WangS.JiaoH.YoungM. J.BrooksT. E.OlsonJ. (2007b). A meta-analysis of testing mode effects in Grade K–12 mathematics tests. Educational and Psychological Measurement, 67, 219–238.
143.
WardW. C. (1988). The College Board Computerized Placement Tests: An application of computerized adaptive testing. Machine-Mediated Learning, 2, 271–282.
144.
WeissD. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.
145.
WengerE. (2000). Communities of practice and social learning systems. Organization, 7, 225–246. doi:10.1177/135050840072002
146.
WilliamsonD. M. (2013). Probable cause: Developing warrants for automated scoring of essays. In ShermisM. D.BursteinJ. (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 153–180). New York, NY: Routledge.