Sage Journals: Discover world-class research

Abstract

Standard setting is a method used to set cut scores on large-scale assessments. One of the most popular standard setting methods is the Bookmark method. In the Bookmark method, panelists are asked to envision a response probability (RP) criterion and move through a booklet of ordered items based on a RP criterion. This study investigates whether or not it is possible to end up with the same cut scores if one were to apply the Bookmark method with two different RP values. Analytical formulas and two hypothetical examples from a large-scale state testing program indicate that it is rarely possible to obtain the same cut score estimates with two different RP values because of the presence of item difficulty gaps present when applying the procedure in practice. Results indicate that if the same group of panelists applied the Bookmark procedure as it is traditionally explained, then cut scores should be lower with the second chosen RP value than they were with the first RP value. This result holds whether or not the second RP value is higher or lower than the first RP value. The examples also reveal that differences in cut score estimates with different RP values can lead to changes in the percentage of examinees at or above the cut scores that may have important practical impacts.

Keywords

Bookmark procedure response probability criterion item difficulty gaps cut scores standard setting

Get full access to this article

View all access options for this article.

References

ACT, Inc. (2005, April). Developing achievement levels on the 2005 national assessment of educational progress in grade twelve mathematics: Special Studies report. Iowa City, IA: Author.

Angoff

W. H.

(1971). Scales, norms, and equivalent scores. In Thorndike

R. L.

(Ed.), Educational measurement (2nd ed., pp. 508-597). Washington, DC: American Council on Education.

Beretvas

N. S.

(2004). Comparison of Bookmark difficulty locations under different item response models. Applied Psychological Measurement, 28, 25-47.

Cizek

G. J.

Bunch

M. B.

(2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.

Davis

S. L.

Buckendahl

C. W.

Gerrow

(in press). Evaluating panelists’ Bookmark standard setting judgments: The impact of random item ordering. International Journal of Testing.

Haertel

E. H.

Lorié

W. A.

(2004). Validating standards-based test score interpretations. Measurement: Interdisciplinary Research & Perspective, 2, 61-103.

Hambleton

R. K.

Pitoniak (2006). Setting performance standards. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 433-470). Washington, DC: American Council on Education.

Hein

S. F.

Skaggs

G. E.

(2009). A qualitative investigation of panelists’ experiences of standard setting using two variations of the bookmark procedure. Applied Measurement in Education, 22, 207-228.

Hein

S. F.

Skaggs

(2010). Conceptualizing the classroom of target students: A qualitative investigation of panelists’ experiences during standard setting. Educational Measurement: Issues and Practice, 29(2), 36-44.

10.

Huynh

(2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.

11.

Karantonis

Sireci

S. G.

(2006). The bookmark standard-setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.

12.

Kolstad

(2002, June). Various approaches to providing content-referenced interpretations for IRT scale reporting: NAEP’s anchor levels, adult literacy levels, and PISA levels. Paper presented at the National Conference on Large-Scale Assessment, Palm Desert, CA.

13.

Kolstad

Cohen

Baldi

Chan

DeFur

Angeles

(2001). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? (Working Paper No. 2001-20). Washington, DC: National Center for Education Statistics.

14.

Lewis

D. M.

Mitzel

H. C.

Green

D. R.

(1996, June). Standard setting: A bookmark approach. In Green

D. R.

(Chair), IRT-based standard setting procedures utilizing behavioral anchoring. Paper presented at the symposium of the Council of Chief State School Officers National Conference of Large-Scale Assessment, Phoenix, AZ.

15.

Lewis

D. M.

Mitzel

H. C.

Green

D. R.

Patz

R. J.

(1999). The bookmark standard setting procedure. Monterey, CA: McGraw-Hill.

16.

Mitzel

H. C.

Lewis

D. M.

Patz

R. J.

Green

D. R.

(2001). The bookmark procedure: Psychological perspectives. In Cizek

G. J.

(Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum.

17.

National Research Council. (2005). Measuring literacy: Performance levels for adults, interim report. Washington, DC: National Academies Press.

18.

Perie

(2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15-29.

19.

Reckase

M. D.

(2001). Innovative methods for helping standard-setting participants to perform their task: The role of feedback regarding consistency, accuracy, and impact. In Cizek

G. J.

(Ed.), Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.

20.

Reckase

M. D.

(2006). A conceptual framework for a psychometric theory of standard setting with examples of its use for evaluating the functioning of two standard setting methods. Educational Measurement: Issues and Practice, 25(2), 4-18.

21.

Skaggs

(2007). Bookmark locations and item response model selection in the presence of local item dependence. Journal of Applied Measurement, 8, 65-83.

22.

Skaggs

Tessema

(2001, April). Item disordinality with the Bookmark standard setting procedure. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.

23.

Way

W. D.

Fitzpatrick

Kreiman

(2009). The effects of response probability criterion on scale location estimation and impact data in item mapping standard setting. Paper presented at the National Council for Measurement in Education, San Diego, CA.

24.

Williams

N. J.

Schulz

E. M.

(2005, April). An investigation of response probability (RP) values used in standard setting. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.

25.

Zwick

Senturk

Wang

Loomis

S. C.

(2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.

The Similarity of Bookmark Cut Scores With Different Response Probability Values

Abstract

Keywords

Get full access to this article

References