Abstract
Background/Context:
Public monitoring of educational progress and inequality often involves tracking changes in the percentage of “proficient” students across groups and over time. These trends are important signals of state and district provision of educational opportunity. I show how known flaws of this percentage metric, sometimes assumed to be negligible, interacted with COVID-19 pandemic conditions to create a “perfect storm,” masking real declines and growing inequality. I use this to motivate three metrics necessary for fuller documentation of educational progress and inequality when tested populations change.
Purpose/Objective/Research Question/Focus of Study:
I present three metrics for measuring and contextualizing changes in educational achievement over time: the “match rate,” the “fair trend,” and the “equity check.” Like doctors or pilots using multiple instruments to diagnose and navigate, I argue that these three metrics are necessary for holistic understanding of educational progress when tested populations change. I show how neglecting these metrics leads to misclassification of schools that do and do not need support. These metrics have their foundations in the statistical literature for missing data and causal inference. I adapt them to the context of public reporting of educational test scores for monitoring educational equity.
Research Design:
I use publicly available data from the California Department of Education from 2019 through 2022 to show how poor reporting metrics led state officials to conclude that test score gaps were closing when they were in fact widening. Drawing from statistical theory, I show how these issues generalize to other contexts. I use statistical models to define three metrics that avoid biases and provide necessary context when tested populations change. The first metric is a percentage I call the “match rate.” The second and third metrics, the “fair trend” and the “equity check,” are regression-adjusted trends for changing populations. I explain how education officials can use these metrics to improve diagnosis, like doctors supplementing a patient’s pulse with their temperature, blood pressure, and oxygen saturation.
Conclusions/Recommendations:
Public reporting using simple metrics like “percent proficient” only yields defensible trend interpretations under conditions that are increasingly narrow and rare: when leaders care only about progress for a single stable population of students. The predictable biases and distortions of percent-proficient metrics necessitate more sophisticated metrics, simply explained, as complements. States and testing agencies should report metrics like the three I propose for transparency in technical documentation and wield them for decision-making, particularly when monitoring equity for changing populations.
Keywords
Get full access to this article
View all access options for this article.
