Abstract
Studies that show marginal generality of test-session behaviors to non-test situations are criticized for addressing the question of validity without first establishing evidence of observational reliability. Internal consistency indices of reliability for observational instruments, while necessary, are insufficient without evidence of interobserver and intraobserver agreement. Data are presented that show that interobserver and intraobserver agreement is inconsistent both in direction and level of ratings when 42 test-session behaviors are rated during each of 21 test sessions in which the WPPSI-R was used as the standardized assessment instrument. A systematic research program on the utility of this clinical practice is needed, with a primary focus on improving observational reliability. Until data that substantiate this clinical practice are accumulated, clinicians are urged to exercise caution when they are drawing inferences from test-session behaviors.
Get full access to this article
View all access options for this article.
