The present article emphasizes that measurement issues must be explicitly considered even in studies that focus on substantive questions. First, dynamics associated with insufficient attention being paid to score reliabilities in substantive studies are discussed. Next, reasons to adjust effect size indices for score unreliability are presented. Finally, some procedures for adjusting effect sizes for score reliability are briefly reviewed.
Get full access to this article
View all access options for this article.
References
1.
Abelson, R. P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 117-141). Mahwah, NJ: Lawrence Erlbaum.
2.
Aiken, L. S. , West, S. G., Sechrest, L., Reno, R. R., with Roediger, H. L., Scarr, S., Kazdin, A. E., & Sherman, S. J. (1990). The training in statistics, methodology, and measurement in psychology. American Psychologist, 45, 721-734.
3.
American Psychological Association . (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.
4.
American Psychological Association . (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.
5.
Baugh, F. , & Thompson, B. (2001). Using effect sizes in social science research. Journal of Educational Research, 12, 120-129.
6.
Boring, E. G. (1919). Mathematical vs. scientific importance. Psychological Bulletin, 16, 335-338.
7.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
8.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.
9.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.
10.
Cortina, J. M. , & Dunlap, W. P. (1997). Logic and purpose of significance testing. Psychological Methods, 2, 161-172.
11.
Eason, S. (1991). Why generalizability theory yields better results than classical test theory: A primer with concrete examples. In B. Thompson (Ed.), Advances in educational research: Substantive findings, methodological developments (Vol. 1, pp. 83-98). Greenwich, CT: JAI.
12.
Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379-390.
13.
Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin, 70, 245-251.
14.
Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York: McGraw-Hill.
15.
Harris, M. J. (1991). Significance tests are not enough: The role of effect-size estimation in theory corroboration. Theory & Psychology, 1, 375-382.
16.
Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189.
17.
Henson, R. K. , & Smith, A. D. (2000). State of the art in statistical significance and effect size reporting: A review of the APA Task Force report and current trends. Journal of Research and Development in Education, 33, 285-296.
18.
Hunter, J. E. , & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
19.
Hunter, J. E. , & Schmidt, F. L. (1994). Correcting for sources of artificial variation across studies. In H. Cooper & L. V. Hedge (Eds.), The handbook of research synthesis (pp. 323-336). New York: Russell Sage Foundation.
20.
Hunter, J. E. , Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage.
21.
Hyde, J. E. (2001). Reporting effect sizes: The roles of editors, textbook authors, and publication manuals. Educational and Psychological Measurement, 61, 225-228.
22.
Johnson, H. G. (1944). An empirical study of the influence of errors of measurement upon correlation. American Journal of Psychology, 57, 521-536.
23.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746-759.
24.
Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61, 213-218.
25.
Kirk, R. E. (in press). The importance of effect magnitude. In S. F. Davis (Ed.), Handbook of research methods in experimental psychology. Oxford, UK: Blackwell.
26.
Kuder, G. F. (1941). Presenting a new journal. Educational and Psychological Measurement, 1, 3-4.
27.
Kupersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist, 43, 635-642.
28.
Meier, S. T. , & Davis, S. R. (1990). Trends in reporting psychometric properties of scales used in counseling psychology research. Journal of Counseling Psychology, 37, 113-115.
29.
Murphy, K. R. (1997). Editorial. Journal of Applied Psychology, 82, 3-5.
30.
Nunnally, J. C. (1982). Reliability of measurement. In H. E. Mitzel (Ed.), Encyclopedia of educational research (pp. 1589-1601). New York: Free Press.
31.
Pedhazur, E. J. , & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum.
32.
Robinson, D. H. , & Wainer, H. (in press). On the past and future of null hypothesis significance testing. Journal of Wildlife Management.
33.
Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly Hills, CA: Sage.
34.
Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.). Newbury Park, CA: Sage.
35.
Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231-244). New York: Russell Sage Foundation.
36.
Rosnow, R. L. , & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.
37.
Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 335-392). Mahwah, NJ: Lawrence Erlbaum.
38.
Snyder, P. , & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334-349.
39.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295.
40.
Thompson, B. (1990). ALPHAMAX: A program that maximizes coefficient alpha by selective item deletion. Educational and Psychological Measurement, 50, 585-589.
41.
Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61, 361-377.
42.
Thompson, B. (1994). Guidelines for authors. Educational and Psychological Measurement, 54, 837-847.
43.
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26-30.
44.
Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them?Theory & Psychology, 9, 165-181.
45.
Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education, 70, 80-93.
46.
Thompson, B. (2002). “Statistical,”“practical,” and “clinical”: How many kinds of significance do counselors need to consider?Journal of Counseling and Development, 80, 64-71.
47.
Thompson, B. , & Snyder, P. A. (1998). Statistical significance and reliability analyses in recent JCD research articles. Journal of Counseling and Development, 76, 431-436.
48.
Thompson, B. , & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195.
49.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.
50.
Vacha-Haase, T. (2001). Statistical significance should not be considered one of life’s guarantees: Effect sizes are needed. Educational and Psychological Measurement, 61, 219-224.
51.
Vacha-Haase, T. , Kogan, L. R., & Thompson, B. (2000). Sample compositions and variabilities in published studies versus those in test manuals: Validity of score reliability inductions. Educational and Psychological Measurement, 60, 509-522.
52.
Vacha-Haase, T. , Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reliability coefficients: A review of three journals. Journal of Experimental Education, 67, 335-341.
53.
Whittington, D. (1998). How well do researchers report their measures? An evaluation of measurement in published educational research. Educational and Psychological Measurement, 58, 21-37.
54.
Wilkinson, L. , & American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. [Reprint available at http://www.apa.org/journals/amp/amp548594.html]
55.
Willson, V. L. (1980). Research techniques in AERJ articles: 1969 to 1978. Educational Researcher, 9(6), 5-10.