Abstract
Considerable thought is often put into designing randomized control trials (RCTs). From power analyses and complex sampling designs implemented preintervention to nuanced quasi-experimental models used to estimate treatment effects postintervention, RCT design can be quite complicated. Yet when psychological constructs measured using survey scales are the outcome of interest, measurement is often an afterthought, even in RCTs. The purpose of this study is to examine how choices about scoring and calibration of survey item responses affect recovery of true treatment effects. Specifically, simulation and empirical studies are used to compare the performance of sum scores, which are frequently used in RCTs in psychology and education, to that of approaches rooted in item response theory (IRT) that better account for the longitudinal, multigroup nature of the data. The results from this study indicate that selecting an IRT model that matches the nature of the data can significantly reduce bias in treatment effect estimates and reduce standard errors.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
