Sage Journals: Discover world-class research

Abstract

Improving teacher evaluation is one of the most pressing but also contested areas of educational policy. Value-added measures have received much of the attention in new evaluation systems, but they can only be used to evaluate a fraction of teachers. Classroom observations are almost universally used to assess teachers, yet their statistical properties have received far less empirical scrutiny, in particular in consequential evaluation systems. In this essay, we highlight some conceptual and empirical challenges that are similar across these different measures of teacher quality. Based on a review of empirical research, we argue that we need much more research focused on observations as performance measures. We conclude by sketching out an agenda for future research in this area.

Keywords

accountability classroom research educational policy policy analysis teacher assessment

Get full access to this article

View all access options for this article.

References

Abedi

Hofstetter

C. H.

Lord

(2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74(1), 1–28.

Allen

J. P.

Pianta

R. C.

Gregory

Mikami

A. Y.

Lun

(2011). An interaction-based approach to enhancing secondary school instruction and student achievement. Science, 333(6045), 1034–1037.

Anderson

(2013, March 30). Curious grade for teachers: Nearly all pass. The New York Times, A1.

Atteberry

Loeb

Wyckoff

(2015). Do first impressions matter? Predicting early career teacher effectiveness. AERA Open, 1(4), 1–23.

Baker

E. L.

Barton

P. E.

Darling-Hammond

Haertel

Ladd

H. F.

Linn

R. L.

. . . Shepard

L.A.

(2010). Problems with the use of student test scores to evaluate teachers. Washington, DC: Economic Policy Institute.

Ball

D. L.

Forzani

(2009). The work of teaching and the challenge for teacher education. The Journal of Teacher Education, 60(5), 497–511.

Barret

Crittenden-Fuller

Guthrie

J. E.

(2015). Subjective ratings of teachers: Implications for strategic and high-stakes decisions. Paper presented at the annual meeting of the Association of Education Finance and Policy, Washington, DC.

Bell

C. A.

Gitomer

D. A.

McCaffrey

Hamre

Pianta

(2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87.

Bell

Jones

Lewis

Kirui

Stickler

Liu

(2015). Understanding consequential assessment systems of teaching: Year 2 final report to Los Angeles Unified School District (Research Memorandum No. RM-15-12). Princeton, NJ: Educational Testing Service.

10.

Bell

Jones

Lewis

Witherspoon

Redash

Kirui

(2016). Administrators’ roles in “valid” observation scores: Moving beyond a narrow measurement perspective. Paper presented at the annual meeting of the Association of Education Finance and Policy, Denver, CO.

11.

Bell

Croft

Leusner

McCaffrey

Gitomer

Pianta

(2014). Improving observational score quality: Challenges in observer thinking. In Kerr

Pianta

Kane

(Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project (pp. 50–97). San Francisco, CA: Jossey-Bass.

12.

Blazar

Kraft

M. A.

(2015). Teacher and teaching effects on students’ academic behaviors and mindsets (Working Paper 41). Cambridge, MA: Mathematica Policy Research. Retrieved from http://www.mathematica-mpr.com/our-publications-and-findings/publications/teacher-and-teaching-effects-on-students-academic-behaviors-and-mindsets

13.

Blazar

Litke

Barmore

(2016). What does it mean to be ranked a “high” or “low” value-added teacher? Observing differences in instructional quality across districts. American Educational Research Journal, 53(2), 324–359.

14.

Brophy

J. E.

Coulter

C. L.

Crawford

W. J.

Evertson

C. M.

King

C. E.

(1975). Classroom observation scales: Stability across time and context and relationships with student learning gains. Journal of Educational Psychology, 67(6), 873–881.

15.

Brophy

J. E.

Good

T. L.

(1986). Teacher behavior and student achievement. In Wittrock

M. E.

(Ed.), Handbook of research on teaching (3rd ed., pp. 328–375). New York, NY: Macmillan.

16.

Bryk

A. S.

Sebring

P. B.

Allensworth

Easton

J. Q.

Luppescu

(2010). Organizing schools for improvement: Lessons from Chicago. Chicago, IL: University of Chicago Press.

17.

Casabianca

J. M.

Lockwood

J. R.

McCaffrey

D. F.

(2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337.

18.

Casabianca

J. M.

McCaffrey

D. F.

Gitomer

D. H.

Bell

C. A.

Hamre

B. K.

Pianta

R. C.

(2013). Effect of observation mode on measures of secondary mathematics teaching. Educational and Psychological Measurement, 73(5), 757–783.

19.

Chaplin

Gill

Thompkins

Miller

(2014). Professional practice, student surveys, and value-added: Multiple measures of teacher effectiveness in the Pittsburgh public schools. Rockville, MD: Regional Educational Laboratory Mid-Atlantic.

20.

Chetty

Friedman

J. N.

Rockoff

J. E.

(2014). Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood. American Economic Review, 104(9), 2633–2679.

21.

Cohen

(2015). The challenge of identifying high-leverage practices. Teachers College Record, 117(8), 1–41.

22.

Cohen

Brown

(2016). Teaching quality across school settings. The New Educator, 12(2), 1–30.

23.

Cohen

Grossman

(2016). Respecting complexity in measures of teaching: Keeping schools and students in focus, Teaching and Teacher Education, 55, 308–317.

24.

Compass overview. (n.d.). Retrieved from http://www.nctq.org/docs/2015-2016-Compass-Overview-in-Jefferson-Parish.pdf

25.

Cor

(2011). The measurement properties of the PLATO rubric. Paper presented at the American Educational Research Association annual meeting, New Orleans, LA.

26.

Curby

T. W.

Stuhlman

Grimm

Mashburn

Chomat-Mooney

Downer

. . . Pianta

R. C.

(2011). Within-day variability in the quality of classroom interactions during third and fifth grade. The Elementary School Journal, 112(1), 16–37.

27.

Danielson

(2007). Enhancing professional practice: A framework for teaching. Alexandria, VA: Association for Supervision and Curriculum Development.

28.

Darling-Hammond

Amrein-Beardsley

Haertel

Rothstein

(2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8–15.

29.

Darling-Hammond

Wise

A. E.

Pease

S. R.

(1983). Teacher evaluation in the organizational context: A review of the literature. Review of Educational Research, 53(3), 285–328.

30.

Dee

T. S.

Wyckoff

(2015). Incentives, selection, and teacher performance: Evidence from IMPACT. Journal of Policy Analysis and Management, 34(2), 267–297.

31.

Dorety

K. M.

Jacobs

(2015). State of the states 2015: Evaluating teaching, leading and learning. Washington, DC: National Council on Teacher Quality.

32.

Ellett

C. D.

Teddlie

(2003). Teacher evaluation, teacher effectiveness and school effectiveness: Perspectives from the USA. Journal of Personnel Evaluation in Education, 17(1), 101–128

33.

Gardner

D. P.

(1983). A nation at risk. Washington, DC: The National Commission on Excellence in Education, U.S. Department of Education.

34.

Garrett

Steinberg

M. P.

(2015). Examining teacher effectiveness using classroom observation scores: Evidence from the randomization of teachers to students. Educational Evaluation and Policy Analysis, 37(2), 224–242.

35.

Gitomer

Bell

McCaffrey

Hamre

B. K.

Pianta

R. C.

(2014). The instructional challenge in improving teaching quality: Lessons from a classroom observation protocol. Teachers College Record, 116(6), 873–881.

36.

Glazerman

Loeb

Goldhaber

D. D.

Raudenbush

Whitehurst

G. J.

(2010). Evaluating teachers: The important role of value-added (Vol. 201). Washington, DC: Brown Center on Education Policy at Brookings.

37.

Goldenberg

(2008). Teaching English language learners: What the research does and does not say. American Educator, 8–44.

38.

Goldhaber

(2015). Exploring the potential of value-added performance measures to affect the quality of the teacher workforce. Educational Researcher, 44(2), 87–95.

39.

Goldhaber

Brown

(in press) Teacher policy under the ESEA and the HEA: A convergent trajectory with an unclear future. In Loss

C. P.

McGuinn

P. J.

(Eds.), The convergence of K–12 and higher education: Policies and programs in a changing era. Cambridge, MA: Harvard Education Press.

40.

Goldhaber

Hansen

(2013). Is it just a bad class? Assessing the long-term stability of estimated teacher performance. Economica, 80(319), 589–612.

41.

Goldring

Grissom

J. A.

Rubin

Neumerski

C. M.

Cannata

Drake

Schuermann

(2015). Make room value added principals’ human capital decisions and the emergence of teacher observation data. Educational Researcher, 44(2), 96–104.

42.

Grissom

J. A.

Loeb

(2014). Assessing principals’ assessments: Subjective evaluations of teacher effectiveness in low- and high-stakes environments. Paper presented at Association for Education Finance and Policy annual meeting, San Antonio, TX.

43.

Grossman

Cohen

Brown

(2014). Understanding instructional quality in English Language Arts: Variations in the relationship between PLATO and value-added by content and context. In Kerr

Pianta

Kane

(Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project (pp. 303–331). San Francisco, CA: Jossey-Bass.

44.

Grossman

Loeb

Cohen

Wyckoff

(2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers’ value-added scores. American Journal of Education, 119(3), 445–470.

45.

Grossman

McDonald

(2008). Back to the future: Directions for research in teaching and teacher education. American Educational Research Journal, 45(1), 184–205.

46.

Hazi

H. M.

Rucinski

(2009). Teacher evaluation as a policy target for improved student learning: A fifty-state review of statute and regulatory action since NCLB. Education Policy Analysis Archives, 17(5), 1–22.

47.

Hill

H. C.

Ball

D. L.

Schilling

S. G.

(2008). Unpacking pedagogical content knowledge: Conceptualizing and measuring teachers’ topic-specific knowledge of students. Journal for Research in Mathematics Education, 39, 372–400.

48.

Hill

H. C.

Charalambous

C. Y.

Kraft

M. A.

(2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64.

49.

Hill

H. C.

Kapitula

Umland

(2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794–831.

50.

A. D.

Kane

T. J.

(2013). The reliability of classroom observations by school personnel. Seattle, WA: Bill & Melinda Gates Foundation.

51.

Jacob

B.A.

Lefgren

2007. What do parents value in education? An empirical investigation of parents’ revealed preferences for teachers. Quarterly Journal of Economics, 122, 1603–1637.

52.

Joe

McClellan

Holtzman

(2014). Scoring design decisions: Reliability and the length and focus of classroom observations. In Kerr

Pianta

Kane

(Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project (pp. 415–443). San Francisco, CA: Jossey-Bass.

53.

Johnson

S.M.

(2015). Will VAMs reinforce the walls of the egg-crate school? Educational Researcher, 44(2), 117–126.

54.

Johnson

S. M.

Kraft

M. A.

Papay

J. P.

(2012). How context matters in high-need schools: The effects of teachers’ working conditions on their professional satisfaction and their students’ achievement. Teachers College Record, 114(10), 1–39.

55.

Kane

T. J.

McCaffrey

D. F.

Miller

Staiger

D. O.

(2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment (Measures of Effective Teaching report). Seattle, WA: Bill & Melinda Gates Foundation.

56.

Kane

T. J.

Staiger

D. O.

(2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Seattle, WA: Bill and Melinda Gates Foundation.

57.

Kraft

M. A.

Gilmour

A. F.

(2016). Revisiting the widget effect: Teacher evaluation reforms and the distribution of teacher effectiveness (Working paper). Providence, RI: Brown University.

58.

Kupermintz

(2003). Teacher effects and teacher effectiveness: A validity investigation of the Tennessee Value Added Assessment System. Educational Evaluation And Policy Analysis, 25(3), 287–298.

59.

Ladson-Billings

(1995). But that’s just good teaching! The case for culturally relevant pedagogy. Theory Into Practice, 34(3), 159–165.

60.

Ladson-Billings

(2009). The dreamkeepers: Successful teachers of African American children. San Francisco, CA: John Wiley & Sons.

61.

Lazarev

Newman

(2015). How teacher evaluation is affected by class characteristics: Are observations biased? Paper presented at Association for Education Finance and Policy annual meeting, San Antonio, TX.

62.

Little

J. W.

(2001). Professional development in pursuit of school reform. In Lieberman

Miller

(Eds.), Teachers caught in the action: Professional development that matters (pp. 28–44). New York, NY: Teachers College Press.

63.

Loeb

Soland

Fox

(2014). Is a good teacher a good teacher for all? Comparing value-added of teachers with their English learners and non-English learners. Educational Evaluation and Policy Analysis, 36(4), 399–416.

64.

Master

Loeb

Whitney

Wyckoff

(2012). Different skills: Identifying differentially effective teachers of English language learners (Working Paper No. 68). Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.

65.

McCaffrey

D. F.

Lockwood

J. R.

Louis

T. A.

Hamilton

(2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67–101.

66.

McCaffrey

Sass

Lockwood

Mihaly

(2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4(4), 572–606.

67.

McLaughlin

M. W.

Talbert

J. E.

(2006). Building school-based teacher learning communities: Professional strategies to improve student achievement. New York, NY: Teachers College Press.

68.

Mihaly

McCaffrey

D. F.

(2014). Grade level variation in observational measures of teacher effectiveness. In Kerr

Pianta

Kane

(Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project (pp. 9–49) San Francisco, CA: Jossey-Bass.

69.

The New Teacher Project. (2012). Teacher evaluation systems comparative overview. Retrieved from http://tntp.org/assets/tools/TNTP_Teacher+Evaluation+System+Comparative+Overview_TSLT+3.12.pdf

70.

Papay

(2012). Refocusing the debate: Assessing the purposes and tools of teacher evaluation. Harvard Educational Review, 82(1), 123–141.

71.

Paris

Alim

H. S.

(2014). What are we seeking to sustain through culturally sustaining pedagogy? A loving critique forward. Harvard Educational Review, 84(1), 85–100.

72.

Park

Y. S.

Chen

Holtzman

(2014). Evaluating efforts to minimize rater bias in scoring classroom observations. In Kerr

Pianta

Kane

(Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project (pp. 383–414). San Francisco, CA: Jossey-Bass.

73.

Pianta

R. C.

Hamre

B. K.

(2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109–119.

74.

Pianta

R. C.

Hamre

B. K.

Haynes

N. J

Mintz

La Paro

K. M.

(2006). CLASS Classroom Assessment Scoring System: Manual Middle Secondary Version Pilot, June 2006. Charlottesville, VA: Teachstone.

75.

Pianta

R. C.

Mashburn

A. J.

Downer

J. T.

Hamre

B. K.

Justice

(2008). Effects of web-mediated professional development resources on teacher-child interactions in pre-kindergarten classrooms. Early Childhood Research Quarterly, 23(4), 431–451.

76.

Polikoff

M. S.

(2015). The stability of observational and student survey measures of teaching effectiveness. American Journal of Education, 121(2), 183–212.

77.

Poon

Schwartz

(2015). Improving feedback in teacher evaluations: An evaluation of Tennessee’s TEAM coach initiative. Paper presented at Association for Education Finance and Policy annual meeting, Washington DC.

78.

Bell

Gitomer

(2014). The role of topic and activity structure in teacher observation scores. Paper presented at the annual conference of the American Educational Research Association, Philadelphia, PA.

79.

Ravitch

(2015, April 1). Here is the New York State teacher evaluation bill. Retrieved from http://dianeravitch.net/2015/04/01/here-is-the-new-york-state-teacher-evaluation-bill/

80.

Rivkin

S. G.

Hanushek

E. A.

Kain

J. F.

(2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417–458.

81.

Rockoff

J. E.

(2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.

82.

Rosenshine

(1970). Evaluation of classroom instruction. Review of Educational Research, 40(2), 279–300.

83.

Rothstein

(2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.

84.

Sarason

S. B.

(1996). Revisiting “The culture of school and the problem of change.” New York, NY: Teachers College Press.

85.

Sartain

Stoelinga

S. R.

Krone

(2010). Rethinking teacher evaluation: Findings from the first year of the Excellence in Teaching Project in Chicago Public Schools. Chicago, IL: University of Chicago Consortium on Chicago School Research Brief.

86.

Sawchuk

(2016). ESEA loosens reins on teacher evaluations, qualifications. Education Week. Retrieved from http://www.edweek.org/ew/articles/2016/01/06/essa-loosens-reins-on-teacher-evaluations-qualifications.html?cmp=eml-enl-eu-news1.

87.

Schochet

P. Z.

Chiang

H. S.

(2013). What are error rates for classifying teacher and school performance using value-added models? Journal of Educational and Behavioral Statistics, 38(2), 142–171.

88.

Semmel

M. I.

(1976). Competency-based teacher education in special education: A review of research and training programs. Bloomington, IN: Center for Innovation in Teaching the Handicapped.

89.

Smolkowski

Gunn

(2012). Reliability and validity of the Classroom Observations of Student-Teacher Interactions (COSTI) for kindergarten reading instruction. Early Childhood Research Quarterly, 27(2), 316–328.

90.

Steinberg

M. P.

Garrett

(2016). Classroom composition and measured teacher performance: What do teacher observation scores really measure? Educational Evaluation and Policy Analysis, 38(2), 293–317.

91.

Steinberg

M. P.

Sartain

(2015). Does teacher evaluation improve school performance? Experimental evidence from Chicago’s Excellence in Teaching Project. Education Finance and Policy,10(4), 535–572.

92.

Suggested observation pacing. (n.d.). Retrieved from http://team-tn.org/wp-content/uploads/2013/08/Suggested-Observation-Pacing.pdf

93.

Taylor

E. S.

Tyler

J. H.

(2012). The effect of evaluation on teacher performance. The American Economic Review, 102, 3628–3651.

94.

Toch

Rothman

(2008). Rush to judgment: Teacher evaluation in public education. Washington, DC: Education Sector.

95.

Watson

J. G.

Kraemer

S. B.

Thorn

C. A.

(2009). The other 69 percent. Washington, DC: Center for Educator Compensation Reform at the U.S. Department of Education, Office of Elementary and Secondary Education.

96.

Weisberg

Sexton

Mulhern

Keeling

(2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Chicago, IL: The New Teacher Project.

97.

Whitehurst

G. J.

Chingos

M. M.

Lindquist

K. M.

(2014). Evaluating teachers with classroom observations: Lessons learned in four districts. Washington, DC: Brown Center on Education Policy and Brookings Institute.

Building a More Complete Understanding of Teacher Evaluation Using Classroom Observations

Abstract

Keywords

Get full access to this article

References