The Changing Nature of Educational Assessment

Abstract

Get full access to this article

View all access options for this article.

References

American Institute of Certified Public Accountants. (2012). Uniform examination FAQs: Examination content structure and delivery. Retrieved from http://www.aicpa.org/BecomeACPA/CPAExam/ForCandidates/FAQ/Pages/computer_faqs_1.aspx

American Psychological Association. (1986). Guidelines for computer-based tests and interpretations. Washington, DC: Author.

Anderson

J. R.

Corbett

A. T.

Koedinger

K. R.

Pelletier

(1995). Cognitive tutors: Lessons learned. Journal of Learning Sciences, 4, 167–207.

Attali

(2011). Automated subscores for TOEFL iBT independent essays (Research Report No. 11-39). Princeton, NJ: Educational Testing Service.

Attali

(2013). Validity and reliability of automated essay scoring. In Shermis

M. D.

Burstein

(Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181–198). New York, NY: Routledge.

Attali

Cayton-Hodges

(2014). Expanding the CBAL mathematics assessment to elementary grades: The development of a competency model and a rational number learning progression (Research Report No. 14-08). Princeton, NJ: Educational Testing Service.

Bedard

Chi

M. T. H.

(1992). Expertise. Current Directions in Psychological Science, 1, 135–139. Retrieved from http://chilab.asu.edu/papers/Expertise.pdf

Bejar

I. I.

(2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9.

Bennett

R. E.

(1998). Reinventing assessment: Speculations on the future of large-scale educational testing. Princeton, NJ: Policy Information Center, Educational Testing Service. Retrieved from https://www.ets.org/research/policy_research_reports/pic-reinvent

10.

Bennett

R. E.

(2010a). Cognitively based assessment of, for, and as learning: A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70–91.

11.

Bennett

R. E.

(2010b). Technology for large-scale assessment. In Peterson

Baker

McGaw

(Eds.), International encyclopedia of education (3rd ed., Vol. 8, pp. 48–55). Oxford, England: Elsevier.

12.

Bennett

R. E.

(2011a). Automated scoring of constructed-response literacy and mathematics items. Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/s/k12/pdf/k12_commonassess_automated_scoring_math.pdf

13.

Bennett

R. E.

(2011b). CBAL: Results from piloting innovative k-12 assessments. Princeton, NJ: Educational Testing Service.

14.

Bennett

R. E.

(2011c). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18, 5–25.

15.

Bennett

R. E.

(2014). Preparing for the future: What educational assessment must do. Teachers College Record, 116(11). Retrieved from http://www.tcrecord.org/Content.asp?ContentID=17623

16.

Bennett

R. E.

Bejar

I. I.

(1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement: Issues and Practice, 17(4), 9–17.

17.

Bennett

R. E.

Braswell

Oranje

Sandene

Kaplan

Yan

(2008). Does it matter if I take my mathematics test on computer? A second empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 6(9). Retrieved from http://files.eric.ed.gov/fulltext/EJ838621.pdf

18.

Bennett

R. E.

Gitomer

D. H.

(2009). Transforming K-12 assessment: Integrating accountability testing, formative assessment, and professional support. In Wyatt-Smith

Cumming

(Eds.), Educational assessment in the 21st century (pp. 43–61). New York, NY: Springer.

19.

Bennett

R. E.

Kane

M. T.

Bridgeman

(2011). Theory of action and validity argument in the context of through-course summative assessment. Princeton, NJ: Educational Testing Service.

20.

Bennett

R. E.

Persky

Weiss

A. R.

Jenkins

(2007). Problem solving in technology-rich environments: A report from the NAEP Technology-Based Assessment Project (NCES 2007-466). Washington, DC: National Center for Education Statistics, US Department of Education. Retrieved from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2007466

21.

Bennett

R. E.

Persky

Weiss

Jenkins

(2010). Measuring problem solving with technology: A demonstration study for NAEP. Journal of Technology, Learning, and Assessment, 8(8). Retrieved from http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1627/1471

22.

Bennett

R. E.

Zhang

(in press). Validity and automated scoring. In Drasgow

(Ed.), Technology in testing: Improving educational and psychological measurement. Washington, DC: National Council on Measurement in Education.

23.

Black

Wiliam

(1998a). Assessment and classroom learning. Assessment in Education, 5, 7–74.

24.

Black

Wiliam

(1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80, 139–148. Retrieved from http://www.pdkintl.org/kappan/kbla9810.htm

25.

Bridgeman

(2013). Human ratings and automated essay evaluation. In Shermis

M. D.

Burstein

(Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 221–232). New York, NY: Routledge.

26.

Brookhart

S. M.

(2010). Formative assessment strategies for every classroom: An ASCD action tool. Alexandria, VA: ASCD.

27.

Bunderson

C. V.

Inouye

D. K.

Olsen

J. B.

(1989). The four generations of computerized testing. In L. Linn

(Ed.), Educational measurement (3rd ed., pp. 367–407). New York, NY: Macmillan.

28.

Butler

A. C.

(2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1118–1133.

29.

Carmichael

S. B.

Martino

Porter-Magee

Wilson

W. S.

(2010). The state of state standards—and the Common Core in 2010. Washington, DC: Fordham Institute. Retrieved from http://www.math.jhu.edu/~wsw/FORD/SOSSandCC2010_FullReportFINAL.pdf

30.

Cayton-Hodges

G. A.

Marquez

Keehner

Laitusis

van Rijn

Zapata-Rivera

. . . Hakkinen

M. T.

(2012). Technology enhanced assessments in mathematics and beyond: Strengths, challenges, and future directions. Princeton, NJ: Educational Testing Service.

31.

Center for K-12 Assessment & Performance Management at ETS. (2014). Coming together to raise achievement: New assessments for the Common Core State Standards. Princeton, NJ: Educational Testing Service. Retrieved from http://www.k12center.org/rsc/pdf/coming_together_to_raise_achievement_april2014.pdf

32.

Chandler

M. A.

(2013, May 20). All Virginia students to use computers for standardized tests. Washington Post. Retrieved from http://www.washingtonpost.com/local/education/all-virginia-students-to-use-computers-for-standardized-tests/2013/05/20/e473f924-bd9c-11e2-97d4-a479289a31f9_story.html

33.

Coffey

J. E.

Hammer

Levin

D., M.

Grant

(2011). The missing disciplinary substance of formative assessment. Journal of Research in Science Teaching, 48, 1109–1136.

34.

Common Core State Standards Initiative. (2010). Common Core State Standards for English Language Arts and Literacy in History/Social Studies, Science, Science, and Technical Subjects. Retrieved from http://www.corestandards.org/ELA-Literacy/

35.

Confrey

Maloney

Nguyen

Mojica

Myers

(2009). Equipartitioning/splitting as a foundation of rational number reasoning. In Tzekaki

Kaldrimidou

Sakonidis

(Eds.), Proceedings of the 33rd Conference of the International Group for the Psychology of Mathematics Education (Vol. 1, pp. 345–352). Thessaloniki, Greece: PME.

36.

Corcoran

Mosher

F. A.

Rogat

(2009). Learning progressions in science: An evidence-based approach to reform. New York: Consortium for Policy Research in Education (CPRE).

37.

Daro

Mosher

F. A.

Corcoran

(2011). Learning trajectories in mathematics: A foundation for standards, curriculum, assessment, and instruction (Research Report No. 68). Philadelphia, PA: CPRE.

38.

Deane

(2012). Rethinking K-12 writing assessment. In Elliot

Perelman

(Eds.), Writing assessment in the 21st century (pp. 87–100). New York, NY: Hampton Press.

39.

Deane

Fowles

Baldwin

Persky

(2011). The CBAL summative writing assessment: A draft eighth-grade design (ETS Research Memorandum No. 11-01). Princeton, NJ: Educational Testing Service.

40.

Deane

Sabatini

Fowles

(2012). Rethinking k-12 writing assessment to support best instructional practices. In Bazerman

Dean

Early

Lunsford

Null

Rogers

Stansell

(Eds.), International advances in writing research: Cultures, places, measures (pp. 83–102). Anderson, SC: Parlor Press.

41.

Deane

Sabatini

O’Reilly

(2012). The CBAL English language arts competency model and provisional learning progressions: Outline of provisional learning progressions. Retrieved from http://elalp.cbalwiki.ets.org/Outline+of+Provisional+Learning+Progressions

42.

Drasgow

Luecht

R. M.

Bennett

R. E.

(2006). Technology and testing. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 471–515). Westport, CT: American Council on Education/Praeger.

43.

Durso

Golub-Smith

M. L.

Mills

C. N.

Schaeffer

G. A.

Steffen

(1995). The introduction and comparability of the computer adaptive GRE General Test (Research Report No. 95-20). Princeton, NJ: Educational Testing Service.

44.

Ericsson

K. A.

Krampe

R. T.

Tesch-Romer

(1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100, 363–406.

45.

Federation of State Medical Boards. (2010). Medical licensing examination: Generation application information. Retrieved from http://www.fsmb.org/usmle_apply.html#usmlefees

46.

Flower

(1994). The construction of negotiated meaning: A social cognitive theory of writing. Carbondale: Southern Illinois University Press.

47.

Frederiksen

(1984). The real test bias: Influences of testing on teaching and learning. American Psychologist, 39, 193–202.

48.

Gee

J. P.

Shaffer

D. W.

(2010). Looking where the light is bad: Video games and the future of assessment. Edge, 6, 3–19. Retrieved from http://edgaps.org/gaps/wp-content/uploads/EDge-Light.pdf

49.

Getting Smart. (2012, April 12). Automated essay scoring demonstrated effective in big trial. Retrieved from http://gettingsmart.com/2012/04/automated-essay-scoring-systems-demonstrate-effectiveness/

50.

Gierl

M. J.

Haladyna

T. M.

(Eds.). (2013). Automatic item generation: Theory and practice. New York, NY: Routledge.

51.

Gordon Commission on the Future of Assessment in Education. (2013). A public policy statement. Princeton, NJ: Author. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_public_policy_report.pdf

52.

Graf

E. A.

(2009). Defining mathematics competency in the service of Cognitively Based Assessment for grades 6 through 8 (ETS Research Memorandum No. RM-09-42). Princeton, NJ: Educational Testing Service.

53.

Graf

E. A.

Harris

Marquez

Fife

Redman

(2009). Cognitively based assessment of, for, and as learning (CBAL) in mathematics: A design and first steps toward implementation (ETS Research Memorandum No. RM-09-07). Princeton, NJ: Educational Testing Service.

54.

Harford

(2014, March 28). Big data: Are we making a big mistake? FT Magazine. Retrieved from http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz2xl8loPdL

55.

Heritage

(2008). Learning progressions: Supporting instruction and formative assessment. Washington, DC: Council of Chief State School Officers. Retrieved from http://www.k12.wa.us/assessment/ClassroomAssessmentIntegration/pubdocs/FASTLearningProgressions.pdf

56.

Herold

(2014a, March 13). Google under fire for data-mining student email messages. Education Week. Retrieved from http://www.edweek.org/ew/articles/2014/03/13/26google.h33.html?cmp=ENL-EU-NEWS2

57.

Herold

(2014b, April 21). inBloom to shut down amid growing data-privacy concerns. Education Week. Retrieved from http://blogs.edweek.org/edweek/DigitalEducation/2014/04/inbloom_to_shut_down_amid_growing_data_privacy_concerns.html

58.

Herrington

Moran

(2012). Writing to a machine is not writing at all. In Elliot

Perelman

(Eds.), Writing assessment in the 21st century: Essays in honor of Edward M. White (pp. 219–232). New York, NY: Hampton Press.

59.

Hinze

S. R.

Wiley

Pellegrino

J. W.

(2013). The importance of constructive comprehension processes in learning from tests. Journal of Memory and Language, 69, 151–164.

60.

Holland

P. W.

Dorans

N. J.

(2006). Linking and equating. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education/Praeger.

61.

Horkay

Bennett

R. E.

Allen

Kaplan

Yan

(2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 5(2). Retrieved from http://files.eric.ed.gov/fulltext/EJ843858.pdf

62.

Irvine

S. H.

Kyllonen

P. C.

(2010). Item generation for test development. New York, NY: Routledge.

63.

Kamisar

(2014, January 7). InBloom sputters amid concerns about privacy of student data. Education Week. Retrieved from http://www.edweek.org/ew/articles/2014/01/08/15inbloom_ep.h33.html

64.

Kane

M. T.

(2006). Validation. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

65.

Kapur

(2010). Productive failure in mathematical problem solving. Instructional Science, 38, 523–550.

66.

Kingsbury

G. G.

Houser

R. L.

(1999). Developing computerized adaptive tests for school children. In Drasgow

Olson-Buchanan

J. B.

(Eds.), Innovations in computerized assessment (pp. 93–115). Mahwah, NJ: Erlbaum.

67.

Kingston

N. M.

(2008). Comparability of computer- and paper-administered multiple-choice tests for k–12 populations: A synthesis. Applied Measurement in Education, 22, 22–37. doi:10.1080/08957340802558326

68.

Kopriva

R. J.

(2009). Assessing the skills and abilities in math and science of ELLs with low English proficiency: A promising new method. AccELLerate!, 2, 7–10. Retrieved from http://www.ncela.us/files/uploads/17/Accellerate_2_1.pdf

69.

Koretz

Hamilton

L. S.

(2006). Testing for accountability in k-12. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 531–578). Westport, CT: American Council on Education/Praeger.

70.

Lave

(1991). Situating learning in communities of practice. In Resnick

L. B.

Levine

J. M.

Teasley

S. D.

(Eds.), Perspectives on socially shared cognition (pp. 63–82). Washington, DC: American Psychological Association. doi:10.1037/10096-003

71.

Lemann

(1999). The big test: The secret history of the American meritocracy. New York, NY: Farrar, Strauss, & Giroux.

72.

Linn

R. L.

Burton

(1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5–8.

73.

Liu

Rogat

Bertling

(2013). A CBAL science model of cognition: Developing a competency model and learning progressions to support assessment development (Research Report No.13-29). Princeton, NJ: Educational Testing Service.

74.

Luecht

R. M.

(2009). Adaptive computer-based tasks under an assessment engineering paradigm. In Weiss

D. J.

(Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http://publicdocs.iacat.org/cat2010/cat09luecht.pdf

75.

Marcus

Davis

(2014, April 6). Eight (no, nine!) problems with big data. New York Times. Retrieved from http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html?_r=1

76.

McManus

(2008). Attributes of effective formative assessment. Washington, DC: Council for Chief State School Officers. Retrieved from http://www.ccsso.org/publications/details.cfm?PublicationID=362

77.

Mislevy

R. J.

Almond

R. G.

Lukas

J. F.

(2003). A brief introduction to evidence-centered design (Research Report No. 03-16). Princeton, NJ: Educational Testing Service.

78.

Mislevy

R. J.

Behrens

J. T.

Bennett

R. E.

Demark

S. F.

Frezzo

D. C.

Levy

. . . Shute

V. J.

(2010). On the roles of external knowledge representations in assessment design. Journal of Technology, Learning, and Assessment, 8(2). Retrieved from http://files.eric.ed.gov/fulltext/EJ873671.pdf

79.

Mislevy

R. J.

Behrens

J. T.

DiCerbo

K. E.

Levy

(2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of Educational Data Mining, 4(1). Retrieved from http://www.educationaldatamining.org/JEDM/index.php/JEDM/article/view/22/12

80.

Mislevy

R. J.

Zwick

(2012). Scaling, linking, and reporting in a periodic assessment system. Journal of Educational Measurement, 49, 148–166.

81.

National Association of State Boards of Accountancy. (2012). New York: Applying for the Uniform CPA Exam. Retrieved from http://www.nasba.org/exams/cpaexam/newyork/

82.

National Board of Medical Examiners. (2012). USMLE examination fees. Retrieved from http://www.nbme.org/students/examfees.html

83.

National Center for Education Statistics. (2012). The nation’s report card: Science in action: Hands-on and interactive computer tasks from the 2009 science assessment (NCES 2012-468). Washington, DC: Institute of Education Sciences, U.S. Department of Education.

84.

National Council of Architectural Registration Boards. (2009). Taking the ARE. Retrieved from http://www.ncarb.org/ARE/Taking-the-ARE.aspx

85.

National Council of Architectural Registration Boards. (2012). ARE guidelines. Retrieved from http://www.ncarb.org/ARE/~/media/Files/PDF/Guidelines/ARE_Guidelines.pdf

86.

National Council of Teachers of English. (2013). NCTE position statement on machine scoring: Machine scoring fails the test. Urbana, IL: Author. Retrieved from http://www.ncte.org/positions/statements/machine_scoring

87.

National Research Council. (2000). How people learn: Brain, mind, experience, and school (Expanded ed.). Washington, DC: National Academies Press.

88.

Pane

J. F.

Griffin

B. A.

McCaffrey

D. F.

Karam

(2013). Effectiveness of Cognitive Tutor Algebra I at scale (WR-984-DEIES). Pittsburgh, PA: Rand Corporation. Retrieved from http://www.siia.net/visionk20/files/Effectiveness%20of%20Cognitive%20Tutor%20Algebra%20I%20at%20Scale.pdf

89.

Partnership for Assessment of Readiness for College and Careers. (2010). The Partnership for Assessment of Readiness for College and Careers (PARCC) application for the Race to the Top comprehensive assessment systems competition. Washington, DC: Author. Retrieved from http://www.parcconline.org/sites/parcc/files/PARCC%20Application%20-%20FINAL.pdf

90.

Partnership for Assessment of Readiness for College and Careers. (2013a). Diagnostic assessments and K-1 formative assessment tools. Washington, DC: Author. Retrieved from http://www.parcconline.org/sites/parcc/files/DiagnosticK-1July%202013Overview.pdf

91.

Partnership for Assessment of Readiness for College and Careers. (2013b). Non-summative assessments. Washington, DC: Author. Retrieved from http://www.parcconline.org/non-summative-assessments

92.

Partnership for Assessment of Readiness for College and Careers. (2013c). PARCC accessibility features and accommodations manual. Washington, DC: Author. Retrieved from http://parcconline.org/sites/parcc/files/PARCCAccessibilityFeaturesandAccommodationsManualNovember2013.pdf

93.

Partnership for Assessment of Readiness for College and Careers. (2013d). PARCC assessment administration guidance (Version 1.0). Washington, DC: Author. Retrieved from http://www.parcconline.org/sites/parcc/files/PARCC%20Assessment%20Administration%20Guidance_FINAL_0.pdf

94.

Partnership for Assessment of Readiness for College and Careers. (2013e). Technology guidelines for PARCC assessments (Version 3.0): Frequently asked questions. Washington, DC: Author. Retrieved from http://parcconline.org/sites/parcc/files/PARCC_TechnologyGuidelines-V3_FAQ.pdf

95.

Partnership for Assessment of Readiness for College and Careers. (2014). States select contractor to help develop and implement PARCC tests: Cost comes in under projection. Washington, DC: Author. Retrieved from http://www.parcconline.org/states-select-contractor-help-develop-and-implement-parcc-tests

96.

Pellegrino

J. W.

Chudowsky

Glaser

(2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academies Press.

97.

Perelman

(2014). When “the state of the art” is counting words. Assessing Writing, 21, 104–111.

98.

Popham

W. J.

(2006). Phony formative assessments: Buyer beware! Educational Leadership, 64(3), 86–87.

99.

Popham

W. J.

(2008). Transformative assessment. Alexandria, VA: ASCD.

100.

Quellmalz

E. S.

Davenport

J. L.

Timms

M. J.

DeBoer

G. E.

Jordan

K. A.

Huang

C.-W.

Buckley

B. C.

(2013). Next-generation environments for assessing and promoting complex science learning. Journal of Educational Psychology, 105, 1100–1114. doi:10.1037/a0032220

101.

Quillen

(2012, May 9). Hewlett automated-essay-grader winners announced. Education Week, Retrieved from http://blogs.edweek.org/edweek/DigitalEducation/2012/05/essay_grader_winners_announced.html

102.

Ramineni

Trapani

C. S.

Williamson

D. M.

Davey

Bridgeman

(2012). Evaluation of e-rater for the GRE issue and argument prompts (Research Report No. 12-02). Princeton, NJ: Educational Testing Service.

103.

Ravitch

(2013). The reign of error: The hoax of the privatization movement and the danger to America’s public schools. New York, NY: Knopf.

104.

Reid

K. S.

(2014, February 28). Chicago parents form coalition to promote state test boycott. Education Week. Retrieved from http://blogs.edweek.org/edweek/parentsandthepublic/2014/02/chicago_parents_form_coalition_to_support_state_test_boycott.html?cmp=ENL-EU-NEWS2

105.

Reidenberg

Russell

N. C.

Kovnot

Norton

T. B.

Cloutier

Alvarado

(2013). Privacy and cloud computing in public schools. New York, NY: Fordham Center on Law and Information Policy. Retrieved from http://ir.lawnet.fordham.edu/clip/2/

106.

Ritter

Anderson

J. R.

Koedinger

K. R.

Corbett

(2007). Cognitive tutor: Applied research in mathematics education. Psychonomic Bulletin & Review, 14, 249–255.

107.

Roediger

H. L.

III Karpicke

J. D.

(2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255.

108.

Rohrer

Pashler

(2010). Recent research on human learning challenges conventional instructional strategies. Educational Researcher, 39, 406–412.

109.

Rudner

L. M.

(2010). Implementing the Graduate Management Admission Test computerized adaptive test. In van der Linden

W. J.

Glas

C. A. W.

(Eds.), Elements of adaptive testing (pp. 151–165). New York, NY: Springer. doi:10.1007/978-0-387-85461-8_8

110.

Rudner

L. M.

Garcia

Welch

(2006). An evaluation of the IntelliMetric^TM essay scoring system. Journal of Technology, Learning, and Assessment, 4(4). Retrieved from http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1651/1493

111.

Schmidt

W. H.

Houang

R. T.

(2012). Curricular coherence and the Common Core State Standards for Mathematics. Educational Researcher, 41, 294–308. doi:10.3102/0013189X12464517

112.

Shaffer

D. W.

(2006). How computer games help children learn. New York, NY: Palgrave MacMillan.

113.

Shavelson

R. J.

Baxter

G. P.

Gao

(1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30, 215–232.

114.

Shepard

L. A.

(2008). Formative assessment: Caveat emptor. In Dwyer

C. A.

(Ed.), The future of assessment: Shaping teaching and learning (pp. 279–303). New York, NY: Erlbaum.

115.

Shepard

L. A.

Daro

Stancavage

F. B.

(2013). The relevance of learning progressions for NAEP. Washington, DC: American Institutes for Research. Retrieved from http://www.air.org/files/NVS_combined_study_3_Relevance_of_Learning_Progressions_for_NAEP.pdf

116.

Shermis

M. D.

Burstein

(Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. New York, NY: Routledge.

117.

Shermis

M. D.

Hamner

(2013). Contrasting state-of-the-art automated scoring of essays. In Shermis

M. D.

Burstein

(Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 313–346). New York, NY: Routledge.

118.

Sigel

(1993). The centrality of a distancing model for the development of representation competence. In Cocking

Renninger

K. A.

(Eds.), The development and meaning of psychological distance (pp. 141–158). Mahwah, NJ: Erlbaum.

119.

Singer

(2013, October 5). Deciding who sees student data. New York Times. Retrieved from http://www.nytimes.com/2013/10/06/business/deciding-who-sees-students-data.html?pagewanted=1&_r=0&adxnnlx=1389978450-gFo%20edDUCpRuRLvjV%20ngMQ

120.

Sleeman

Brown

J. S.

(Eds.). (1982). Intelligent tutoring systems. New York, NY: Academic Press.

121.

Smarter Balanced Assessment Consortium. (2010a). Race to the Top Assessment Program Application for new grants: Comprehensive Assessment Systems CFDA Number: 84.395B. Sacramento, CA: Author. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2011/12/Smarter-Balanced-RttT-Application.pdf

122.

Smarter Balanced Assessment Consortium. (2010b). Theory of action: An excerpt from the Smarter Balanced Race to the Top Application. Sacramento, CA: Author. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/02/Smarter-Balanced-Theory-of-Action.pdf

123.

Smarter Balanced Assessment Consortium. (2012a). Frequently asked questions. Sacramento, CA: Author. Retrieved from http://www.smarterbalanced.org/resources-events/faqs/

124.

Smarter Balanced Assessment Consortium. (2012b). Smarter Balanced assessments. Sacramento, CA: Author. Retrieved from http://www.smarterbalanced.org/smarter-balanced-assessments/

125.

Smarter Balanced Assessment Consortium. (2014). Smarter Balanced Assessment Consortium: Usability, accessibility, and accommodations guidelines. Sacramento, CA: Author. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2013/09/SmarterBalanced_Guidelines_091113.pdf

126.

Song

Deane

Graf

E. A.

van Rijn

(2013). Using argumentation learning progressions to support teaching and assessments of English language arts. R&D Connections, 22, 1–14. Retrieved from http://www.ets.org/Media/Research/pdf/RD_Connections_22.pdf

127.

Stecher

(2010). Performance assessment in an era of standards-based educational accountability. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education. Retrieved from https://scale.stanford.edu/system/files/performance-assessment-era-standards-based-educational-accountability.pdf

128.

Steenbergen-Hu

Cooper

(2013). A meta-analysis of the effectiveness of intelligent tutoring systems on K-12 students’ mathematical learning. Journal of Educational Psychology, 105, 980–987. doi:10.1037/a0032447

129.

Steenbergen-Hu

Cooper

(2014). A meta-analysis of the effectiveness of intelligent tutoring systems on college students’ academic learning. Journal of Educational Psychology, 106, 331–347. doi:10.1037/a0034752

130.

Stone

Davey

(2011). Computer-adaptive testing for students with disabilities: A review of the literature (Research Report No. 11-32). Princeton, NJ: Educational Testing Service.

131.

Strauss

(2014, September 28). How much time will new Common Core tests take kids to finish? Quite a lot. Washington Post. Retrieved October 1, 2014 from http://www.washingtonpost.com/blogs/answer-sheet/wp/2014/09/28/how-much-time-will-new-common-core-tests-take-kids-to-finish-quite-a-lot/

132.

Tan

(2014, May 16). What’s the upside to tough, new Common Core tests for schools? Throwing out those decade-old computers. Hechinger Report. Retrieved from http://hechingerreport.org/content/whats-upside-tough-new-common-core-tests-schools-throwing-decade-old-computers_15905/

133.

Tucker

(2012, May/June). Grand test auto: The end of testing. Washington Monthly. Retrieved from http://www.washingtonmonthly.com/magazine/mayjune_2012/special_report/grand_test_auto037192.php

134.

University of Akron. (2012, April 4). Man and machine: Better writers, better grades. Retrieved from http://www.uakron.edu/im/online-newsroom/news_details.dot?newsId=40920394-9e62-415d-b038-15fe2e72a677&pageTitle=Top%20Story%20Headline&crumbTitle=Man%20and%20%20machine:%20Better%20writers,%20better%20grades

135.

U.S. Department of Education. (2010). Race to the Top assessment program application for new grants: Comprehensive Assessment Systems (CFDA No. 84.395B). Washington, DC: Author.

136.

U.S. Department of Education, Office of Educational Research and Improvement. (1994). What do student grades mean? Differences across schools (Office of Research Report OR 94-3401). Washington, DC: Office of Research. Retrieved from http://files.eric.ed.gov/fulltext/ED367666.pdf

137.

U.S. Medical Licensing Examination. (2012). 2013 USMLE bulletin. Retrieved from http://www.usmle.org/

138.

VanLehn

van de Sande

(2009). Acquiring conceptual expertise from modeling: The case of elementary physics. In Ericsson

K. A.

(Ed.), Development of professional expertise: Toward measurement of expert performance and design of optimal learning environments (pp. 356–378). Cambridge, England: Cambridge University Press.

139.

Virginia Department of Education. (2012). Standards of Learning (SOL) and testing. Richmond, VA: Author. Retrieved from http://www.doe.virginia.gov/testing/sol/standards_docs/mathematics/parents_students_should_know.shtml

140.

Wainer

(Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Erlbaum.

141.

Wang

Jiao

Young

M. J.

Brooks

Olson

(2007a). Comparability of computer-based and paper-and-pencil testing in K−12 reading assessments: A meta-analysis of testing mode effects. Educational and Psychological Measurement, 68, 5–24. doi:10.1177/0013164407305592

142.

Wang

Jiao

Young

M. J.

Brooks

T. E.

Olson

(2007b). A meta-analysis of testing mode effects in Grade K–12 mathematics tests. Educational and Psychological Measurement, 67, 219–238.

143.

Ward

W. C.

(1988). The College Board Computerized Placement Tests: An application of computerized adaptive testing. Machine-Mediated Learning, 2, 271–282.

144.

Weiss

D. J.

(1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.

145.

Wenger

(2000). Communities of practice and social learning systems. Organization, 7, 225–246. doi:10.1177/135050840072002

146.

Williamson

D. M.

(2013). Probable cause: Developing warrants for automated scoring of essays. In Shermis

M. D.

Burstein

(Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 153–180). New York, NY: Routledge.

147.

Woodruff

D. J.

Ziomek

R. L.

(2004). Differential grading standards among high schools (ACT Research Report Series 2004-2). Iowa City, IA: ACT. Retrieved from http://www.act.org/research/researchers/reports/pdf/ACT_RR2004-2.pdf

148.

Yan

von Davier

A. A.

Lewis

(Eds.). (2014). Computerized multistage testing: Theory and applications. London, England: Chapman & Hall/CRC.