There are several techniques that increase the precision of subscores by borrowing information from other parts of the test. These techniques have been criticized on validity grounds in several of the recent publications. In this note, the authors question the argument used in these publications and suggest both inherent limits to the validity argument and empirical issues worth examining.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
2.
de la TorreJ.PatzR. J. (2005). Making the most of what we have: A practical application of multidimensional IRT in test scoring. Journal of Educational and Behavioral Statistics, 30, 295-311.
3.
DwyerA.BoughtonK. A.YaoL.SteffenM.LewisD. (2006). A comparison of subscale score augmentation methods using empirical data. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
4.
HabermanS. J. (2008a). When can subscores have value?Journal of Educational and Behavioral Statistics, 33, 204-229.
5.
HabermanS. J. (2008b). Subscores and validity (ETS Research Report No. RR-08-64). Princeton, NJ: Educational Testing Services.
6.
HabermanS. J.SinharayS. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209-227.
7.
KaneM. T. (2006). Validation. In BrennanR. L. (Ed.), Educational measurement (4th ed., pp.18-64). Westport, CT: Praeger.
8.
LuechtR. M. (2003, April). Applications of multidimensional diagnostic scoring for certification and licensure tests. Paper presented at the meeting of the National Council on Measurement in Education, Chicago, IL.
9.
LyrenP. (2009). Reporting subscores from college admission tests. Practical Assessment, Research, and Evaluation, 14, 1-10.
10.
MessickS. (1989). Validity. In LinnR. L. (Ed.) Educational measurement (3rd ed., pp. 13-103). Washington, DC: National Council on Measurement in Education and American Council on Education.
11.
National Research Council. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academies Press.
12.
PuhanG.SinharayS.HabermanS. J.LarkinK. (2010). Comparison of subscores based on classical test theory. Applied Measurement in Education, 23, 1-20.
13.
ReckaseM. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
14.
SinharayS. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47, 150-174.
15.
SkorupskiW. P.CarvajalJ. (2010). A comparison of approaches for improving the reliability of objective level scores. Educational and Psychological Measurement, 70, 357-375.
16.
StoneC. A.YeF.ZhuX.LaneS. (2010). Providing subscale scores for diagnostic information: A case study when the test is essentially unidimensional. Applied Measurement in Education, 23, 63-86.
17.
WainerH.SheehanK.WangX. (2000). Some paths toward making praxis scores more useful. Journal of Educational Measurement, 37, 113-140.
18.
WainerH.VeveaJ. L.CamachoF.ReeveB. B.RosaK.NelsonL.SwygertK. A.. . . ThissenD. (2001). Augmented scores—"Borrowing strength" to compute scores based on small numbers of items. In ThissenD.WainerH. (Eds.), Test scoring (pp. 343-387). Mahwah, NJ: Lawrence Erlbaum.
19.
YaoL.BoughtonK. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105.
20.
YenW. M. (1987, June). A Bayesian/IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, Montreal, Quebec, Canada.