Abstract
When determining how many items to include on a criterion-referenced test, practitioners must re solve various nonstatistical issues before a par ticular solution can be applied. A fundamental problem is deciding which of three true scores should be used. The first is based on the prob ability that an examinee is correct on a "typical" test item. The second is the probability of having acquired a typical skill among a domain of skills, and the third is based on latent trait models. Once a particular true score is settled upon, there are several perspectives that might be used to de termine test length. The paper reviews and critiques these solutions. Some new results are described that apply when latent structure models are used to esti mate an examinee's true score.
Get full access to this article
View all access options for this article.
