Abstract
Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how estimates would differ had the test included alternative items. We introduce a model that estimates the magnitude of item-by-teacher/school variance accurately, revealing that standard VAMs can overstate reliability and overestimate differences between units. Using 16 academic outcomes from 8 studies with item-level data, we show how standard VAMs overstate reliability by a median of 0.04 on the 0 to 1 reliability scale (mean = 0.09, SD = 0.10) and provide standard deviations of teacher/school effects that are a median of 3% too large (mean = 12%, SD = 23% points). We discuss how imprecision due to heterogeneous VA effects across items attenuates effect sizes, complicates comparisons across studies, and contributes to temporal instability, though these effects are reduced when the number of items is high. Our results suggest that accurate estimation and interpretation of VAMs may be improved using item-level data, including qualitative data about how items represent the content domain.
Get full access to this article
View all access options for this article.
