Sage Journals: Discover world-class research

Abstract

Over-reliance on significance testing has been heavily criticized in psychology. Therefore the American Psychological Association recommended supplementing the p value with additional elements such as effect sizes, confidence intervals, and considering statistical power seriously. This article elaborates the conclusions that can be drawn when these measures accompany the p value. An analysis of over 30 summary papers (including over 6,000 articles) reveals that, if at all, only effect sizes are reported in addition to p’s (38%). Only every 10th article provides a confidence interval and statistical power is reported in only 3% of articles. An increase in reporting frequency of the supplements to p’s over time owing to stricter guidelines was found for effect sizes only. Given these practices, research faces a serious problem in the context of dichotomous statistical decision making: since significant results have a higher probability of being published (publication bias), effect sizes reported in articles may be seriously overestimated.

Keywords

confidence interval effect size NHST publication bias significance testing statistical power

Get full access to this article

View all access options for this article.

References

*Alhija

F. N.

Levy

(2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265.

Altman

D. G.

(1982). Misuse of statistics is unethical. In Altman

D. G.

Gore

S. M.

(Eds.), Statistics in practice (pp. 1–2). London, UK: BMJ Books.

Altman

D. G.

Machin

Bryant

T. N.

Gardner

M. J.

(Eds.). (2000). Statistics with confidence: Confidence intervals and statistical guidelines (2nd ed.). London, UK: BMJ Books.

American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

*Andersen

M. B.

McCullagh

Wilson

G. J.

(2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29, 664–672.

Anderson

D. R.

Burnham

K. P.

Thompson

W. L.

(2000). Null hypothesis testing: Problems, prevalence, and alternatives. Journal of Wildlife Management, 64, 912–923.

Atkinson

D. R.

Furlang

M. J.

Wampold

B. E.

(1982). Statistical significance, reviewer evaluations, and the scientific process: Is there a (statistically) significant relationship? Journal of Counseling Psychology, 29, 189–194.

Baguley

(2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617.

10.

Bakan

(1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.

11.

Balluerka

Gomez

Hidalgo

(2005). The controversy over null hypothesis significance testing revisited. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 55–70.

12.

Begg

C. B.

(1994). Publication bias. In Cooper

Hedges

L. V.

(Eds.), The handbook of research synthesis (pp. 399–409). New York, NY: Russell Sage Foundation.

13.

Belia

Fidler

Williams

Cumming

(2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396.

14.

*Bezeau

Graves

(2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology, 23, 399–406.

15.

Boring

E. G.

(1919). Mathematical vs. scientific importance. Psychological Bulletin, 16, 335–338.

16.

Brandstätter

(1999). Confidence intervals as an alternative to significance testing. Methods of Psychological Research Online, 14, 33–46.

17.

Budge

Katz

(1995). Constructing psychological knowledge: Reflections on science, scientists and epistemology in the APA Publication Manual . Theory & Psychology, 5, 217–231.

18.

*Byrd

J. K.

(2007). A call for statistical reform in Educational Administration Quarterly . Educational Administration Quarterly, 43, 381–391.

19.

Capraro

M. M.

Capraro

R. M.

(2003). Exploring the APA Fifth Edition Publication Manual’s impact on the analytic preferences of journal editorial board members. Educational and Psychological Measurement, 63, 554–565.

20.

Cashen

L. H.

Geiger

S. W.

(2004). Statistical power and the testing of null hypotheses: A review of contemporary management research and recommendations for future studies. Organizational Research Methods, 7, 151–167.

21.

Castro Sotos

A. E.

Vanhoof

Van den Noortgate

Onghena

(2007). Students’ misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review, 2, 98–113.

22.

Chan

A. W.

Hróbjartsson

Haahr

M. T.

Gøtzsche

P. C.

Altman

D. G.

(2004). Empirical evidence for selective reporting of outcomes in randomized trials—Comparison of protocols to published articles. Journal of the American Medical Association, 291, 2457–2465.

23.

*Clark-Carter

(1997). The account taken of statistical power in research published in the British Journal of Psychology . British Journal of Psychology, 88, 71–83.

24.

Cohen

(1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York, NY: Academic Press.

25.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York, NY: Erlbaum.

26.

Cohen

(1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.

27.

Cohn

L. D.

Becker

B. J.

(2003). How meta-analysis increases statistical power. Psychological Methods, 8, 243–253.

28.

*Conzelmann

Raab

(2009). Datenanalyse: Das Null-Ritual und der Umgang mit Effekten in der Zeitschrift für Sportpsychologie [Data analysis: The null ritual and the use of effect sizes]. Zeitschrift für Sportpsychologie, 16, 43–54.

29.

*Crosby

R. D.

Wonderlich

S. A.

Mitchell

J. E.

deZwaan

Engel

S. G.

Connolly

… Taheri

(2006). An empirical analysis of eating disorders and anxiety disorders publications (1980–2000) - Part II: Statistical hypothesis testing. International Journal of Eating Disorders, 39, 49–54.

30.

Cumming

(2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

31.

Cumming

Fidler

(2009). Confidence intervals: Better answers to better questions. Journal of Psychology, 217, 15–26.

32.

*Cumming

Fidler

Leonard

Kalinowski

Christiansen

Kleinig

… Wilson

(2007). Statistical reform in psychology: Is anything changing. Psychological Science, 18, 230-233.

33.

Cumming

Finch

(2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574.

34.

Cumming

Maillardet

(2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11, 217–227.

35.

Cumming

Williams

Fidler

(2004). Replication and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299–311.

36.

Dalton

D. R.

Dalton

C. M.

(2008). Meta-analyses: Some very good steps toward a bit longer journey. Organizational Research Methods, 11, 127–147.

37.

Dar

Serlin

R. C.

Omer

(1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75–82.

38.

Denis

(2003). Alternatives to null hypothesis significance testing. Theory & Science, 4, 1. Retrieved from http://theoryandscience.icaap.org/content/vol4.1/02_denis.html

39.

Dickersin

Min

Y. I.

(1993). Publication bias: The problem that won’t go away. Annals of the New York Academy of Sciences, 703, 135–146.

40.

*Dunleavy

E. M.

Barr

C. D.

Glenn

D. M.

Miller

K. R.

(2006). Effect size reporting in applied psychology: How are we doing? The Industrial-Organizational Psychologist, 43, 29–37.

41.

Easterbrook

P. J.

Berlin

J. A.

Gopalan

Matthews

D. R.

(1991). Publication bias in clinical research. The Lancet, 337, 867–872.

42.

Edwards

Lindman

Savage

L. J.

(1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.

43.

*Faulkner

Fidler

Cumming

(2008). The value of RCT evidence depends on the quality of statistical analysis. Behaviour Research and Therapy, 46, 270–281.

44.

Ferguson

C. J.

(2009). Is psychological research really as good as medical research? Effect size comparisons between psychology and medicine. Review of General Psychology, 13, 130–136.

45.

Ferguson

C. J.

Brannick

M. T.

(2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120–128. doi: 10.1037/a0024445

46.

Fidler

(2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62, 749–770.

47.

Fidler

(2006). From statistical significance to effect size estimation: Statistical reform in psychology, medicine and ecology. Unpublished Ph.D. thesis, University of Melbourne, Australia.

48.

Fidler

Cumming

(2007). Lessons learned from statistical reform efforts in other disciplines. Psychology in the Schools, 44, 441–449.

49.

Fidler

Cumming

Burgman

Thomason

(2004). Statistical reform in medicine, psychology and ecology. The Journal of Socio-Economics, 33, 615–630.

50.

*Fidler

Cumming

Thomason

Pannuzzo

Smith

Fyffe

… Schmitt

(2005). Toward improved statistical reporting in the Journal of Consulting and Clinical Psychology . Journal of Consulting and Clinical Psychology, 73, 136–143.

51.

Fidler

Thomason

Cumming

Finch

Leeman

(2004). Editors can lead researchers to confidence intervals, but can’t make them think. Psychological Science, 15, 119–126.

52.

*Finch

Cumming

Thomason

(2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 181–210

53.

*Finch

Cumming

Williams

Palmer

Griffith

Alders

… Goodman

(2004). Reform of statistical inference in psychology: The case of memory and cognition. Behavior Research Methods, Instruments, & Computers, 36, 312–324.

54.

Finch

Thomason

Cumming

(2001). Past and future APA guidelines for statistical practice. Theory & Psychology, 12, 825–853.

55.

Fisher

R. A.

(1950). Statistical methods for research workers (11th ed.). Edinburgh, UK: Oliver & Boyd. (Original work published 1925)

56.

Fisher

R. A.

(1951). The design of experiments (5th ed.). Edinburgh, UK: Oliver & Boyd. (Original work published 1935)

57.

Frick

R. W.

(1995). A problem with confidence intervals. American Psychologist, 50, 1102–1103.

58.

Fritz

Lermer

E. M.

Kühberger

(2011). The significance fallacy in inferential statistics. Manuscript submitted for publication.

59.

Gardner

M. J.

Altman

D. G.

(1986). Confidence intervals rather than P values: Estimation rather than hypothesis testing. British Medical Journal, 292, 746–750.

60.

Gerber

A. S.

Malhotra

(2008). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30.

61.

Gigerenzer

(1993). The superego, the ego, and the id in statistical reasoning. In Keren

Lewis

(Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum.

62.

Gigerenzer

(2010). Personal reflections on theory and psychology. Theory & Psychology, 20, 733–743.

63.

Gigerenzer

Krauss

Vitouch

(2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In Kaplan

(Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 391–408). Thousand Oaks, CA: Sage.

64.

Gigerenzer

Murray

(1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

65.

Greenland

(1998). Meta-analysis. In Rothman

Greenland

(Eds.), Modern epidemiology (pp. 287–318). Philadelphia, PA: Lippincott-Raven.

66.

Greenwald

A. G.

(1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20.

67.

Grissom

R. J.

Kim

J. J.

(2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum.

68.

Hagen

R. L.

(1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15–24.

69.

*Hager

(2005). Vorgehensweise in der deutschsprachigen psychologischen Forschung: Eine Analyse empirischer Arbeiten der Jahre 2001 und 2002 [Procedures in German empirical research: An analysis of some psychological journals of the years 2001 and 2002]. Psychologische Rundschau, 56, 191–200.

70.

Hallahan

Rosenthal

(1996). Statistical power: Concepts, procedures, and applications. Behaviour Research and Therapy, 34, 489–499.

71.

Haller

Krauss

(2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research Online, 7, 1–20.

72.

Halpin

P. F.

Stam

H. J.

(2006). Inductive inference or inductive behavior: Fisher and Neyman–Pearson approaches to statistical testing in psychological research (1940–1960). American Journal of Psychology, 119, 625–653.

73.

Harlow

L. L.

(1997). Significance testing introduction and overview. In Harlow

L. L.

Mulaik

S. A.

Steiger

J. H.

(Eds.), What if there were no significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum.

74.

Harlow

L. L.

Mulaik

S. A.

Steiger

J. H.

(Eds.). (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum.

75.

Hays

W. L.

Winkler

R. L.

(1970). Statistics: Probability, inference and decision. New York, NY: Holt, Rinehart & Winston.

76.

Hedges

L. V.

(2008). What are effect sizes and why do we need them? Child Development Perspectives, 2, 167–171.

77.

Henson

R. K.

(2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34, 601–629.

78.

*Hoekstra

Finch

Kiers

H. A. L.

Johnson

(2006). Probability as certainty: Dichotomous thinking and the misuse of p values. Psychonomic Bulletin & Review, 13, 1033–1037.

79.

Hubbard

Ryan

P. A.

(2000). The historical growth of statistical significance testing in psychology—and its future prospects. Educational and Psychological Measurement, 60, 661–681.

80.

Huberty

C. J.

(2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227–240.

81.

Hunter

J. E.

Schmidt

F. L.

(1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.

82.

International Committee of Medical Journal Editors. (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108, 258–265.

83.

*Ives

(2003). Effect size use in studies of learning disabilities. Journal of Learning Disabilities, 36, 490–504.

84.

Kalinowski

Fidler

(2010). Interpreting significance: The differences between statistical significance, effect size, and practical importance. Newborn and Infant Nursing Reviews, 10, 50–54.

85.

*Keselman

H. J.

Huberty

C. J

Lix

L. M.

Olejnik

Cribbie

Donahue

… Levin

J. R.

(1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386.

86.

*Kieffer

K. M.

Reese

R. J.

Thompson

(2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1977: A methodological review. Journal of Experimental Education, 69, 280–309.

87.

Killeen

P. R.

(2006). Beyond statistical significance: A decision theory for science. Psychonomic Bulletin & Review, 13, 549–562.

88.

*Kirk

(1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759.

89.

Kline

R. B.

(2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.

90.

*Kosciulek

J. F.

Szymanski

E. M.

(1993). Statistical power analysis of rehabilitation counseling research. Rehabilitation Counseling Bulletin, 36, 212–219.

91.

Kraemer

H. C.

Thiemann

(1987). How many subjects? Statistical power analysis in research. Beverly Hills, CA: Sage.

92.

Kruschke

J. K.

(2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14, 293–300.

93.

Lenth

R. V.

(2007). Statistical power calculations. Journal of Animal Science, 85, E24–E29.

94.

Levine

T. R.

Asada

K. J.

Carpenter

(2009). Sample sizes and effect sizes are negatively correlated in meta-analyses: Evidence and implications of a publication bias against nonsignificant findings. Communication Monographs, 76, 286–302.

95.

Levine

T. R.

Weber

Hullett

Park

H. S.

Lindsey

L. L. M.

(2008). A critical assessment of null hypothesis significance testing in quantitative communication research. Human Communication Research, 34, 171–187.

96.

Mahoney

M. J.

(1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1, 161–175.

97.

*Matthews

M. S.

Gentry

McCoach

D. B

Worrell

F. C.

Matthews

D. M.

Dixon

(2008). Evaluating the state of a field: Effect size reporting in gifted education. Journal of Experimental Education, 77, 55–68.

98.

Maxwell

S. E.

(2004). The persistence of underpowered studies in psychological research: Causes, consequences and remedies. Psychological Methods, 9, 147–163.

99.

Maxwell

S. E.

Kelley

Rausch

J. R.

(2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537–563.

100.

May

W. W.

(1975). The composition and function of ethical committees. Journal of Medical Ethics, 1, 23–29.

101.

McDaniel

M. A.

Rothstein

H. R.

Whetzel

D. L.

(2006). Publication bias: A case study of four test vendors. Personnel Psychology, 59, 927–953.

102.

*McMillan

J. H.

Lawson

Lewis

Snyder

(2002, April). Reporting effect size: The road less traveled. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.

103.

Meehl

P. E.

(1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.

104.

Mone

M. A.

Mueller

G. C.

Mauland

(1996). The perceptions and usage of statistical power in applied psychology and management research. Personnel Psychology, 49, 103–120.

105.

Murphy

K. R

Myors

(2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Mahwah, NJ: Erlbaum.

106.

Neyman

Pearson

E. S.

(1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Statistical Society, Series A, 231, 289–337.

107.

Nickerson

R. S.

(2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301.

108.

Oakes

(1986). Statistical inference: A commentary for the social and behavioral sciences. New York, NY: Wiley.

109.

*Osborne

J. W.

(2008). Sweating the small stuff in educational psychology: How effect size and power reporting failed to change from 1969 to 1999, and what that means for the future of changing practices. Educational Psychology, 28, 151–160.

110.

Overall

J. E.

(1969). Classical statistical hypothesis testing within the context of Bayesian theory. Psychological Bulletin, 71, 285–292.

111.

*Paul

K. M.

Plucker

J. A.

(2004). Two steps forward, one step back: Effect size reporting in gifted education research from 1995–2000. Roeper Review, 26, 68–72.

112.

*Plucker

J. A.

(1997). Debunking the myth of the “highly significant” result: Effect sizes in gifted education research. Roeper Review, 20, 122–126.

113.

Richardson

J. T. E.

(2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6, 135–147.

114.

Rosnow

R. L.

Rosenthal

(2009). Effect sizes: Why, when, and how to use them. Journal of Psychology, 217, 6–14.

115.

Rossi

J. S.

(1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646–656.

116.

Schmidt

F. L.

(1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115–129.

117.

Schmidt

F. L.

Hunter

J. E.

(1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In Harlow

L. L.

Mulaik

S. A.

Steiger

J. H.

(Eds.), What if there were no significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum.

118.

Sedlmeier

Gigerenzer

(1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 107, 309–316.

119.

Selvin

H. C.

Stuart

(1966). Data-dredging procedures in survey research. American Statistician, 20, 20–23.

120.

Serlin

R. C.

Lapsley

D. K.

(1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73–83.

121.

Serlin

R. C.

Lapsley

D. K.

(1993). Rational appraisal of psychological research and the good-enough principle. In Keren

Lewis

(Eds.), A handbook of data analysis in behavioural sciences: Vol. 1. Methodological issues (pp. 199–228). Hillsdale, NJ: Erlbaum.

122.

Shercliffe

R. J.

Stahl

Tuttle

M. P.

(2009). The use of meta-analysis in psychology: A superior vintage or the casting of old wine in new bottles? Theory & Psychology, 19, 413–430.

123.

Skidmore

S. T.

Thompson

(2011). Choosing the best correction formula for the Pearson r ² effect size. Journal of Experimental Education, 79, 257–278.

124.

*Snyder

P. A.

Thompson

(1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335–348.

125.

Sterling

T. D.

(1959). Publication bias and their possible effects on inference drawn from tests of significance—or vice versa. Journal of the American Statistical Association, 54, 30–34.

126.

Sterling

T. D.

Rosenbaum

W. L.

Weinkam

J. J.

(1995). Publication bias revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49, 108–112.

127.

Stern

J. M.

Simes

R. J.

(1997). Publication bias: Evidence of delayed publication in a cohort study of clinical research projects. British Medical Journal, 315, 640–645.

128.

Sterne

A. C.

Gavaghan

Egger

(2000). Publication and related bias in meta-analysis. Journal of Clinical Epidemiology, 53, 1119–1129.

129.

*Sun

Pan

Wang

L. L.

(2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989–1004.

130.

Thompson

(1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61, 361–377.

131.

Thompson

(1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25, 26–30.

132.

Thompson

(1997–2001). 402 citations questioning the indiscriminate use of null hypothesis significance tests in observational studies. Retrieved from http://www.warnercnr.colostate.edu/~anderson/thompson1.html

133.

Thompson

(1999a). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 10, 167–183.

134.

*Thompson

(1999b). Improving research clarity and usefulness with effect size indices as supplements to statistical significance tests. Exceptional Children, 65, 329–337.

135.

Thompson

(2002a). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development, 80, 64–71.

136.

Thompson

(2002b). What future quantitative social science research could look like: Conﬁdence intervals for effect sizes. Educational Researcher, 31, 25–32.

137.

*Thompson

Snyder

P. A.

(1997). Statistical significance testing practices in the Journal of Experimental Education . Journal of Experimental Education, 66, 75–83.

138.

*Thompson

Snyder

P. A.

(1998). Statistical significance and reliability analyses in recent Journal of Counseling & Development research articles. Journal of Counseling & Development, 76, 436–441.

139.

Tukey

J. W.

(1991). The philosophy of multiple comparisons. Statistical Science, 6, 100–116.

140.

Tversky

Kahneman

(1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105–110.

141.

*Vacha-Haase

Ness

C. N.

(1999). Statistical significance testing as it relates to practice: Use within professional psychology: Research and practice. Professional Psychology: Research and Practice, 30, 104–105.

142.

*Vacha-Haase

Nilsson

J. E.

(1998). Statistical significance reporting: Current trends and uses in MECD . Measurement and Evaluation in Counseling and Development, 31, 46–57.

143.

*Vacha-Haase

Nilsson

J. E.

Reetz

D. R.

Lance

T. S.

Thompson

(2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory & Psychology, 10, 413–425.

144.

Vacha-Haase

Thompson

(2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473–481

145.

Wagenmakers

E. J.

(2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.

146.

Wang

Thompson

(2007). Is the Pearson r ² biased, and if so, what is the best correction formula? Journal of Experimental Education, 75, 109–125.

147.

Wilkinson

, & the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanation. American Psychologist, 54, 594–604.

148.

Williamson

P. R.

Gamble

Altman

D. G.

Hutton

J. L.

(2005). Outcome selection bias in meta-analysis. Statistical Methods in Medical Research, 14, 515–524.

149.

*Woods

S. P.

Rippeth

J. D.

Conover

Carey

C. L.

Parsons

T. D.

Troster

A. I.

(2006). Statistical power of studies examining the cognitive effects of subthalamic nucleus deep brain stimulation in Parkinson’s disease. Clinical Neuropsychologist, 20, 27–38.

A comprehensive review of reporting practices in psychological journals: Are effect sizes really enough?

Abstract

Keywords

Get full access to this article

References