Examining the Internal Validity and Statistical Precision of the Comparative Interrupted Time Series Design by Comparison With a Randomized Experiment

Abstract

Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design’s validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment but a nonequivalent comparison group that is assessed at six time points before treatment. We estimate program effects with and without matching of the comparison schools, and we also systematically vary the number of pretest time points in the analysis. CITS designs produce impact estimates that are extremely close to the experimental benchmarks and, as implemented here, do so equally well with and without matching. Adding time points provides an advantage so long as the pretest trend differences in the treatment and comparison groups are correctly modeled. Otherwise, more time points can increase bias.

Keywords

interrupted time series educational evaluation within-study comparison randomized clinical trial

Get full access to this article

View all access options for this article.

References

Ballart

Riba

(1995). Impact of legislation requiring moped and motorbike riders to wear helmets. Evaluation and Program Planning, 18, 311–320.

Campbell

D. T.

Erlebacher

A. E.

(1970). How regression artifacts can mistakenly make compensatory education programs look harmful. In Hellmuth

(Ed.), The disadvantaged child: vol. 3, Compensatory education: A national debate (pp. 185–210). New York, NY: Brunner/Mazel.

Campbell

D. T.

Ross

H. L.

(1968) The Connecticut crackdown on speeding: Time-series data in quasi-experimental analysis. Law & Society Review, 3, 33–54.

Cook

T. D.

Campbell

D. T.

(1979). Quasi-experimentation: Design and analysis issues for field settings. Boston, MA: Houghton Mifflin.

Cook

T. D.

Shadish

W. R.

Wong

V. C.

(2008). Three conditions under which experiments and observational studies often produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy Analysis and Management, 27, 724–750.

Cook

T. D.

Steiner

P. M.

(2010). Case matching and the reduction of selection bias in quasi-experiments: The relative importance of covariate choice, unreliable measurement and mode of data analysis. Psychological Methods, 15, 56–68.

Diaz

J. J.

Handa

(2006). An assessment of propensity score matching as a nonexperimental impact estimator: Evidence from Mexico’s PROGRESA program. Journal of Human Resources, 41, 319–345.

Fretheim

Soumerai

S. B.

Zhang

Oxman

A. D.

Ross-Degnan

(2013) Interrupted time-series analysis yielded an effect estimate concordant with the cluster randomized controlled-trial result. Journal of Clinical Epidemiology, 66, 883–887.

Heckman

Ichimura

Smith

J. C.

Todd

(1998) Characterizing selection bias. Econometrica, 66, 1107–1098.

10.

Heckman

Ichimura

Todd

P. E.

(1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training program. Review of Economic Studies, 64, 605–654.

11.

Heckman

Ichimura

Todd

P. E.

(1998). Matching as an econometric evaluation estimator. Review of Economic Studies 65, 261–294.

12.

Kratochwill

T. R.

Hitchcock

Horner

R. H.

Levin

J. R.

Odom

S. L.

Rindskopf

D. M.

Shadish

(2010). Single-case designs technical documentation. Retrieved from http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=229

13.

Lalonde

(1986). Evaluating the econometric evaluations of training with experimental data. American Economic Review, 76, 604–620.

14.

Michalopoulos

Bloom

H. S.

Hill

C. J.

(2004). Can propensity-score methods match the findings from a random assignment evaluation of mandatory welfare-to-work programs? Review of Economics and Statistics, 86, 156–179.

15.

Mulford

H. A.

Ledolter

Fitzgerald

J. L.

(1992). Alcohol availability and consumption: Iowa sales data revisited. Journal of Studies on Alcohol, 53, 487–494.

16.

Pohl

Steiner

P. M.

Eisermann

Soellner

Cook

T. D.

(2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463–479.

17.

Roifman

C. M.

Levison

Gelfand

E. W.

(1987). High-dose versus low-dose intravenous immunoglobulin in hypogammaglobulinaemia and chronic lung disease. The Lancet, 329, 1075–1077.

18.

Rubin

(2008). The design and analysis of gold standard randomized experiments. Comment on “Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment” by Shadish, Clark, and Steiner. The Journal of the American Statistical Association, 103, 1350–1353.

19.

Schneeweiss

Maclure

Carleton

Glynn

R. J.

Avorn

(2004). Clinical and economic consequences of a reimbursement restriction of nebulised respiratory therapy in adults: Direct comparison of randomised and observational evaluations. British Medical Journal, 328, 560.

20.

Shadish

W. R.

Clark

M. H.

Steiner

P. M.

(2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334–1343.

21.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

22.

Shadish

W. R.

Galindo

Wong

Steiner

Cook

(2011). A randomized experiment comparing random and cutoff-based assignment. Psychological Methods, 16, 179–191.

23.

Shadish

W. R.

Rindskopf

D. M.

Boyajian

J. G.

(n.d.). A comparison of results from single-case designs to a randomized experiment. Manuscript under review.

24.

Smith

J. C.

Todd

(2005). Does matching overcome LaLonde’s critique of nonexperimental estimators. Journal of Econometrics, 125, 305–353.

25.

Somers

Zhu

Jacob

Bloom

(2013). The validity and precision of the comparative interrupted time series design and the difference-in-difference design in educational evaluation (MDRC working paper in research methodology). MDRC: New York, NY.

26.

Steiner

Cook

T. D.

Shadish

(2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236.