Validation of Performance-Based Assessments

Abstract

Using Messick’s (1995, 1996) framework for validity, six aspects of construct validation are outlined to guide the validation of performance-based assessments: content, substantive, structural, generalizability, external, and consequential. Each aspect is discussed, with the focus on studies that could be conducted within the context of a large-scale educational assessment. Also discussed are the issues that affect construct validation within that context, and recommendations for future areas of study are outlined.

Get full access to this article

View all access options for this article.

References

American Educational Research Association , American Psychological Association , & National Council on Measurement in Education (1985). Standards for educational and psychological testing. Washington DC: Author.

American Educational Research Association , American Psychological Association , & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington DC: Author.

Aschbaker, P. R. (1991). Performance assessment: State activity, interest, and concerns. Applied Measurement in Education, 4, 275–288.

Blank, R. K. (1989). Development of a 50-state system of education indicators: Issues of design, implementation, and use. Washington DC: Council of Chief State School Officers.

Blank, R. K. , & Engler, P. (1992). Has science and mathematics education improved since “A nation at risk”? Washington DC: Council of Chief State School Officers.

Bond, L. (1995). Unintended consequences of performance assessment: Issues of bias and fairness. Educational Measurement: Issues and Practice, 14 (4), 21–24.

Brennan, R. L. (1992). Elements of generalizability theory. Iowa City IA: American College Testing.

Brennan, R. L. (1995). The conventional wisdom about group mean scores. Journal of Educational Measurement, 14, 385–396.

Brennan, R. L. (1996). Generalizability of performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment(pp. 19–58). Washington DC: National Center for Education Statistics. (NCES 96-802)

10.

Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24, 339–353.

11.

Burton, E. (1998). An investigation of the schoollevel generalizability of performance assessment results. Unpublished doctoral dissertation, University of Colorado, Boulder.

12.

Campbell, D. T. , & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81–105.

13.

Candell, G. L. & Ercikan, K. (1994). On the generalizability of school-level performance assessment scores. International Journal of Educational Research, 21, 267–278.

14.

Chudowsky, N. , & Behuniak, P. (1998). Using focus groups to examine the consequential aspect of validity. Educational Measurement: Issues and Practice, 17 (4), 28–38.

15.

Crocker, L. (1997). Assessing content representativeness of performance assessment exercises. Applied Measurement in Education, 10, 83–95.

16.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement(2nd ed., pp. 443–507). Washington DC: American Council on Education.

17.

Cronbach, L. J. , Gleser, G. C. , Nanda, H. , & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability of scores and profiles. New York: Wiley.

18.

Cronbach, L. J. , Linn, R. L. , Brennan, R. L. , & Haertel, E. H. (1997). Generalizability analysis for performance assessments of student achievement or school effectiveness. Educational and Psychological Measurement, 57, 373–399.

19.

Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement(1st ed., pp. 621–694). Washington DC: American Council on Education.

20.

Haertel, E. H. , & Linn, R. L. (1996). Comparability. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 59–78). Washington DC: National Center for Education Statistics. (NCES 96-802)

21.

Harrison, J. M. (1998). A comparison of strategies for estimating internal consistency on tests with missing scores. Unpublished master’s thesis, University of Florida, Gainesville.

22.

Kane, M. T. (1982). A sampling model for validity. Applied Psychological Measurement, 6, 125–160.

23.

Linn, R. L. (1997). Evaluating the validity of assessments: The consequences of use. Educational Measurement: Issues and Practice, 16 (2), 14–16.

24.

Linn, R. L. , & Baker, E. L. (1996). Can performancebased student assessments be psychometrically sound? In J. B. Baron & D. P. Wolf (Eds.),Performance-based student assessment: Challenges and possibilities (pp. 84–103). Chicago: University of Chicago Press.

25.

Linn, R. L. , Baker, E. L. , & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20, 15–21.

26.

Linn, R. L. , & Herman, J. L. (1997). Standards-led assessment: Technical and policy issues in measuring school and student progress (CSETechnical Report No. 426). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing.

27.

McLaughlin, M. W. , & Shepard, L. A. (1995). Improving education through standards-based reform. A report by the National Academy of Education Panel on Standards-Based Education Reform. Stanford CA: National Academy of Education.

28.

Mehrens, W. A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16 (2), 16–18.

29.

Mehrens, W. A. , & Kaminski, J. (1989). Methods for improving standardized test scores: Fruitful, fruitless, or fraudulent? Educational Measurement: Issues and Practice, 8 (1), 14–22.

30.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement(3rd ed.). New York: American Council on Education.

31.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.

32.

Messick, S. (1996). Validity in performance assessments. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment. Washington DC: National Center for Education Statistics. (NCES 96-802)

33.

Miller, M. D. (1998). Generalizability of performancebased assessments. Washington DC: Council of the Chief State School Officers.

34.

Miller, M. D. (1999). Teacher uses and perceptions of the impact of statewide performance-based assessments. Washington DC: Council of the Chief State School Officers.

35.

Miller, M. D. , & Legg, S. M. (1993). Alternative assessment in a high-stakes environment. Educational Measurement: Issues and Practice, 12 (3), 9–15.

36.

Miller, M. D. , & Seraphine, A. E. (1993). Can test scores remain authentic when teaching to the test? Educational Assessment, 1, 119–129.

37.

Perkins, D. N. , & Salomon, G. (1989). Are cognitive skills context-bound? Educational Researcher, 18 (1), 16–25.

38.

Popham, W. J. (1997). Consequential validity: Right concern—wrong concept. Educational Measurement: Issues and Practice, 16 (2), 9–13.

39.

Shavelson, R. J. , Baxter, G. P. , & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30, 215–232.

40.

Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park CA: Sage.

41.

Shepard, L. A. (1989). Why we need better assessments. Educational Leadership, 46 (7), 4–9.

42.

Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405–450.

43.

Shepard, L. A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16 (2), 5–8, 13–13, 24–24.

44.

Stecher, B. M. , Klein, S. P. , Solano-Flores , G. McCaffrey, D. , Robyn, A. , Shavelson, R. J. ,& Haertel, E. (2000). The effects of content, format and inquiry level on science performance assessment scores. Applied Measurement in Education, 13, 139–160.

45.

Wiggins, G. (1989). Teaching to the (authentic) test. Educational Leadership, 46 (7), 41–47.

46.

Wise, L. L. , Hauser, R. M. , Mitchell, K. J. , & Feuer, M. J. (1998). Evaluation of the voluntary national tests: Phase I. Washington DC: National Academy Press.

47.

Yen, W. M. (1997). The technical quality of performance assessments: Standard errors of percents of students reaching standard. Educational Measurement: Issues and Practice, 16 (3), 5–15.