Abstract
This article examines theoretical and empirical issues related to the statistical power of impact estimates for experimental evaluations of education programs. The author considers designs where random assignment is conducted at the school, classroom, or student level, and employs a unified analytic framework using statistical methods from the literature. Focusing on standardized test scores of elementary school students, this article discusses appropriate precision standards and, for each design, the required number of schools to achieve those standards using empirical values of intraclass correlations, regression R 2 values, and other parameters. Clustering effects vary by design but are typically large. Thus, large school samples are required for education trials, and many evaluations will only have sufficient power to detect precise impacts for relatively large subgroups of sites.
Get full access to this article
View all access options for this article.
