Sage Journals: Discover world-class research

Abstract

Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling (GRM). Along with number of categories (from 2 to 6), two scale characteristics of scale length (n = 5, 10, and 20 items) and item discrimination (high/medium/low) were examined. Results of this study show that there was virtually no difference in psychometric properties of the scales using 4, 5, or 6 categories. Most deteriorating change was observed when the number of response categories reduced from 3 to 2 points in all six psychometric measures. Small moderating effects by scale length and item discrimination seem to be present, that is, a slightly larger impact on the psychometric properties by changing the number of response categories in a shorter and/or highly discriminating scale. This study concludes with the suggestion that a caution should be made if a scale has only 2 response categories but that limitation may be overcome by manipulating other scale features, namely, scale length or item discrimination.

Keywords

reliability validity scale survey psychometrics item discrimination scale length IRT Graded-Response Model

Get full access to this article

View all access options for this article.

References

Andrich

(1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.

Bendig

A. W.

(1954). Reliability and the number of rating scale categories. Journal of Applied Psychology, 38, 38-40.

Bock

R. D.

Mislevy

R. J.

(1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.

Borgers

Hox

Sikkel

(2004). Response effects in surveys on children and adolescents: The effect of number of response options, negative wording, and neutral mid-point. Quality & Quantity, 38, 17-34.

Chang

(1994). A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18, 205-215.

Churchill

G. A.

Jr. Peter

J. P.

(1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360-375.

Cicchetti

D. V.

Showalter

Tyrer

P. J.

(1985). The effect of number of rating scale categories on levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement, 9, 31-36.

Cox

(1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407-422.

Dawes

J. G.

(2002). Five point vs. eleven point scales: Does it make a difference to data characteristics? Australasian Journal of Market Research, 10, 39-47.

10.

Dawes

J. G.

(2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point, and 10-point scales. International Journal of Market Research, 50, 61-77.

11.

Fox

C. M.

Jones

J. A.

(1998). Use of Rasch modeling in counseling psychology research. Journal of Counseling Psychology, 45, 30-45.

12.

Gilljam

Granberg

(1993). Should we take the “don’t know” for an answer? Public Opinion Quarterly, 57, 348-357.

13.

Givon

M. M.

Shapira

(1984). Response to rating scales: A theoretical model and its application to the number of categories problem. Journal of Marketing Research, 21, 410-419.

14.

Green

P. E.

Rao

V. R.

(1970). Rating scales and information recovery: How many scales and response categories to use? The Journal of Marketing, 34, 33-39.

15.

Komorita

S. S.

Graham

W. K.

(1965). Number of scale points and the reliability of scales. Educational and Psychological Measurement, 15, 987-995.

16.

Kroh

(2007). Measuring left-right political orientation: The choice of response format. Public Opinion Quarterly, 71, 204-220.

17.

Lee

(2012). Conducting cognitive interviews in cross-national settings. Assessment. Advance online publication. doi:10.1177/1073191112436671

18.

Lozano

L. M.

García-Cueto

E. M.

Muñiz

(2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4, 73-79.

19.

Masters

G. N.

(1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

20.

Matell

M. S.

Jacoby

(1971). Is there an optimal number of alternatives for Likert scale items? Educational and Psychological Measurement, 31, 657-674.

21.

Mislevy

R. J.

Beaton

A. E.

Kaplan

Sheehan

K. M.

(1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29, 133-161.

22.

Muñiz

García-Cueto

Lozano

L. M.

(2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38, 61-69.

23.

Muraki

Bock

R. D.

(1999). PARSCALE. Chicago, IL: Scientific Software.

24.

Neumann

(1981). Comparison of six lengths of rating scales: Students’ attitudes toward instruction. Psychological Reports, 48, 399-404.

25.

Nunnally

J. C.

(1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.

26.

Preston

C. C.

Colman

A. M.

(2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1-15.

27.

Reise

S. P.

(1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.

28.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2).

29.

Samejima

(1972).A general model for free response data. Psychometrika Monograph Supplement, 37(1, Pt. 2).

30.

Schutz

H. G.

Rucker

M. H.

(1975). A comparison of variable configurations across scale lengths: An empirical study. Educational and Psychological Measurement, 35, 319-324.

31.

S. X.

Cullen

J. B.

(1998). Response categories and potential cultural bias: Effects of an explicit middle point in cross-cultural surveys. International Journal of Organizational Analysis, 6, 218-230.

32.

Symonds

P. M.

(1924). On the loss of reliability in ratings due to coarseness of the scale. Journal of Experimental Psychology, 7, 456-461.

In Search of the Optimal Number of Response Categories in a Rating Scale

Abstract

Keywords

Get full access to this article

References