Sage Journals: Discover world-class research

Abstract

Despite discussion and institution of new reforms in psychology research, little is known about how much reform psychologists believe is still needed across various research practices and whether instructors are teaching students about replication and reform in their courses. To investigate these questions, we distributed questionnaires assessing perceived need for reform in psychology research and the teaching of replication and reform to instructors of undergraduate and graduate psychology courses across multiple listservs (n = 328). Participants reported discussing topics related to replication and reform briefly in their courses and that moderate changes are still needed in psychology research. Topics were discussed more extensively in advanced vs. introductory courses, and in methods/statistics vs. content courses. Perceived need for reform and number of student researchers supervised/year correlated with teaching these issues, suggesting that those who believe more change is needed in psychology research and are more involved in shaping the next generation of psychology researchers are more likely to discuss replication and reform in their courses. Our questionnaires provide a preliminary tool to be further refined, validated, and applied in future research on knowledge and perceptions of problems in social science research and the impact of teaching these issues in the classroom.

Keywords

Research methodology science reform science communication teaching of psychology meta-science

Over the past decade, highly publicized cases of research misconduct and failed replications have raised concerns about the integrity of findings reported in the scientific literature (e.g., Nosek, et al., 2015). The pressure to publish (e.g., Fanelli, 2010), preference for positive (e.g., Franco, Malhotra, & Simonovits, 2014), perfect-looking findings and novel, compelling narratives (e.g., O’Boyle, Banks, & Gonzalez-Mule, 2017), and incentive systems that reward publishing over truth-telling, accuracy, and dependability (e.g., Nosek, Spies, & Motyl, 2012) threaten the integrity of the scientific enterprise, encouraging behaviours such as p-hacking, selective reporting, publication bias, and HARKing (Kerr, 1998), while discouraging replication, high-powered studies, and the use of diverse paradigms, stimuli, and samples.

Recently, psychology researchers have begun developing more rigorous policies and standards to improve scientific practice. Many journals have adopted new reporting policies, requiring a discussion of power and how sample size was determined, full disclosure statements, and effect sizes and exact p-values for all analyses. Some journals have begun awarding badges for exceptional transparency, including pre-registration and public posting of study data and materials (Kidwell et al., 2016; osf.io/tvyxz). Journals are also increasingly welcoming the submission of replication studies.

Despite discussion and institution of new reforms, some believe the reproducibility “crisis” is overblown (e.g., Stroebe & Strack, 2014), the replicability of findings in the psychological literature is quite high (Gilbert, King, Pettigrew, & Wilson, 2016), questionable research practices are less prevalent than originally estimated (Fiedler & Schwarz, 2016), particular responses and reforms have been unnecessary or disadvantageous (e.g., Stroebe & Strack, 2014), and the reform movement has produced a hostile environment where researchers have been, or fear becoming, personally attacked (e.g., Dweck, 2017). Researchers have also expressed concern about the signal a “crisis” sends to the public and potential funders (e.g., Rutjens, Heine, Sutton, & van Harreveld, 2017). However, outspoken critics and supporters’ views are generally the ones heard, leaving an open question as to what the majority of researchers believe (Sternberg, 2017). A 2015 survey revealed that, overall, social/personality psychologists perceive the replicability of studies in their field to be low, but slightly more replicable than 10 years ago, and that research practices have improved as a result of the replication and reform movement (Motyl et al., 2017). However, little is known about how much reform psychology researchers believe is still needed in psychology research and which practices they believe still need reform.

Moreover, discussion of science reform has largely focused on revising current research practice, rather than on reforming psychology curriculum, education, and training. However, with new policies and guidelines emerging, students entering the field will not only need to learn them but will have the opportunity to refine and improve them in the future. Some efforts have been made to consider the accuracy of research described in textbooks (e.g., Ferguson, Brown, & Torres, 2016), redesign methods courses to teach replication as a core component of research practice (Frank & Saxe, 2012), and develop pedagogical tools to communicate issues of replication and best practices to students (Chopik, Bremner, Defever, & Keller, 2018). In general, however, it is unknown what instructors are teaching students about current issues and debates in psychology research and how instructors’ own perceptions of the replication and reform movement influence their teaching.

Study Overview and Hypotheses

In this study, we developed questionnaires to (a) provide insight into academic psychologists’ perceptions of how much reform is still needed in psychology research and their teaching of replication, interpretation, and transparency; and (b) offer a preliminary tool to be refined, validated, and applied in future research further investigating these issues and their consequences. Using these measures, we tested the following questions and hypotheses:
Q1A. To what extent are psychology instructors teaching topics related to replication and reform in introductory undergraduate, advanced undergraduate, and graduate psychology courses?

No specific prediction was made about the extent to which psychology instructors are teaching topics related to replication and reform at each course level. However, given the complexity and advanced nature of many of the topics assessed, we expected that topics would be discussed more frequently in more advanced courses (i.e., most in graduate courses, followed by advanced undergrad courses, and least in intro undergrad courses).
Q1B. Are certain types of psychology instructors teaching these topics more than others?¹

We investigated whether discussion of issues related to replication and reform varied based on length of time teaching, teaching load and focus, class size, and specialty in psychology.
Q2A. How much change do psychology instructors believe is needed in psychology research?

Q2B. Do certain types of psychology instructors believe more change is needed in psychology research than others?

Although no prediction was made about how much change psychology instructors overall believe is needed in psychology research, we tested several competing hypotheses about whether perceived need for change varied based on length of time teaching, teaching load and focus, and specialty in psychology.
Q3. Are psychology instructors more likely to teach about issues of replication and reform if they believe more change is needed in psychology research?

We predicted that psychology instructors would be more likely to teach about issues of replication and reform if they perceive greater changes to be needed in psychology research practice.

Method

Participants

A total of 402 participants provided informed consent to participate in this study. However, four participants did not proceed past the consent form, and one withdrew consent midway through the survey (and was removed from the data set). Of the remaining 397 participants, 328 completed the survey (although a few did not answer all demographic items).

Of the 314–315 participants who answered each demographic item, 227 identified as female, 86 as male, and one as other; nine as Hispanic or Latino and 305 as Not Hispanic or Latino; two as American Indian or Alaska Native, 14 as Asian, seven as Black or African American, 287 as White, and eight as Other (six participants selected two options for race). The majority of participants indicated teaching at institutions in the United States (n = 291).

Materials and Procedure

We aimed to collect a large sample of psychology instructors by recruiting participants from major psychology societies with member listservs: the Society for Personality and Social Psychology, Society for the Teaching of Psychology, and Society for the Psychological Study of Social Issues (to our knowledge, these are the only psychological societies with member accessible listservs). In April 2017, we sent an email to the listservs for these societies, inviting all who had taught at least one psychology course in the past year at the college level to participate in a survey on the teaching of undergraduate and graduate psychology. Approximately two weeks after the initial email, we sent a reminder, specifying the closing date for the survey (two weeks following the reminder). As compensation, participants had the opportunity to enter their name in a raffle to win one of three $100 Amazon gift cards.

After providing consent, participants were asked whether they taught a psychology course in the past year at the introductory undergraduate (e.g., 100–200 level), advanced undergraduate (e.g., 300–400 level), and graduate level. For each course level participants indicated teaching, they were asked to select one course they taught in the past year at that level and enter its title.

Next, participants were presented with 28 topics related to the replication and reform movement in psychology and were asked to indicate the extent to which they discussed each topic in the first course they identified (1 = did not discuss, 2 = discussed briefly, 3 = discussed in moderate depth, 4 = discussed extensively). Participants were told to answer these questions about the most recent time they taught the course in the past year. Topics included a range of issues in methodology and research design (e.g., the importance of replication), analysis, interpretation, reporting (e.g., p-hacking), and overall practice (e.g., recent publicized failures to replicate studies), varying in scope, difficulty, complexity, and desirability. Because the items varied in difficulty and scope, participants were told that the topics may be too advanced for or not relevant to the course they identified. Table 1 contains exact wording for all items.
Table 1.
Mean Level of Discussion of Each Topic in Introductory Undergraduate, Advanced Undergraduate, and Graduate Psychology Courses

Intro. Undergrad.
Advanced Undergrad.
Graduate

Methodology and Research Design M (SD) % Discussed M (SD) % Discussed M (SD) % Discussed

Q3. The importance of replication as a research methodology 2.31 (0.82) 85.7 2.33 (0.90) 80.9 2.42^r (0.97) 81.4

Q19. Use of WEIRD samples (drawn from Western, education, industrialized, rich, democratic societies) 1.95 (0.89) 63.4 2.22 (1.00) 71.5 2.25ⁱ (1.03) 69.5

Q15. Statistical power 1.50 (0.77) 36.2 1.71 (0.90) 48.1 2.02^r (1.04) 57.6

Q14. How researchers determine their sample size 1.50 (0.75) 36.2 1.57 (0.83) 38.7 2.00^r (1.02) 57.6

Q8. Pre-registration (specifying sample size, materials, procedures, hypotheses, and plans for data analysis prior to data collection or analysis) 1.40 (0.69) 29.7 1.52 (0.80) 35.7 1.68^r (0.86) 47.5

Analysis, Interpretation, and Reporting

Q18. Generalizability (whether the results of a study apply in different contexts or populations than those in the original study) 2.61 (0.86) 90.3 2.80 (0.91) 93.2 2.61ⁱ (0.95) 86.4

Q27. Alternative explanations for research findings (a different explanation than offered by the researchers) 2.30 (0.93) 78.8 2.55 (1.00) 81.3 2.59ⁱ (1.04) 83.1

Q28. Conflicting evidence in the literature (“debates” between various researchers) 2.28 (0.92) 77.3 2.53 (0.93) 85.1 2.64ⁱ (1.06) 83.1

Q2. Plagiarism 2.24 (0.93) 78.9 2.37 (0.97) 80.0 2.00 (0.87) 69.5

Q26. The concept of scientific uncertainty 2.03 (0.93) 66.8 2.15 (1.01) 68.9 2.10 (1.03) 62.7

Q17. Overclaiming (i.e., overstating or exaggerating research findings) 1.76 (0.79) 56.6 1.98 (0.98) 60.4 2.22ⁱ (0.98) 74.6

Q16. Effect size 1.63 (0.85) 43.4 1.89 (1.00) 54.5 2.22^r (0.98) 64.4

Q20. Publication bias (i.e., publication decisions based on the direction or significant of the findings; the “file-drawer problem”) 1.62 (0.78) 45.9 1.85 (0.98) 51.9 2.19^r (0.94) 74.6

Q22. Theoretical bias (the influence of theoretical preferences on research design and interpretation) 1.65 (0.75) 49.5 1.89 (0.96) 55.7 2.05ⁱ (1.01) 62.7

Q1. Data fabrication/falsification 1.65 (0.73) 51.3 1.77 (0.84) 54.5 1.80^r (1.91) 52.5

Q11. Selective reporting (e.g., failing to report all conditions of a study, all data exclusions, or all dependent measures relevant to the research hypothesis) 1.46 (0.70) 35.1 1.74 (0.89) 49.4 2.08^r (0.97) 66.1

Q21. Political bias (the influence of political beliefs or values on research design and interpretation) 1.47 (0.66) 38.0 1.57 (0.81) 40.4 1.83ⁱ (0.93) 54.2

Q13. Reporting post hoc explanations for research findings as predicted, a priori hypotheses (i.e., HARKing or “Hypothesizing after the results are known”) 1.28 (0.57) 22.9 1.51 (0.85) 32.3 1.80^r (0.96) 47.5

Q12. P-hacking (performing multiple statistical tests of the same hypothesis and reporting only those that produce the desired result(s)) 1.24 (0.57) 17.9 1.53 (0.91) 31.5 1.83^r (0.97) 49.2

Q9. Open access to study materials 1.33 (0.66) 22.9 1.34 (0.66) 25.1 1.63^r (0.83) 44.1

Q10. Open access to data files 1.26 (0.57) 20.4 1.31 (0.64) 23.4 1.64^r (0.76) 49.2

Overall Practice and Climate

Q4. How commonly (or uncommonly) replication studies are performed in psychology 1.76 (0.77) 57.3 1.86 (0.91) 57.0 2.05^r (0.94) 67.8

Q24. The pressure to publish in academia 1.53 (0.71) 41.6 1.72 (0.90) 48.5 2.02^r (1.03) 59.3

Q5. Recent publicized failures to replicate studies in psychology (e.g., Nosek et al., 2015) 1.45 (0.67) 35.8 1.66 (0.87) 44.3 1.92^r (0.92) 62.7

Q6. Current debates on the problem of replication in psychology (e.g., Gilbert et al., 2016) 1.44 (0.65) 35.5 1.63 (0.86) 43.0 1.98^r (0.92) 62.7

Q25. Authorship and authorship order decisions 1.25 (0.55) 20.4 1.32 (0.66) 23.0 1.54^r (0.84) 35.6

Q23. Political homogeneity in academic psychology 1.24 (0.54) 19.4 1.33 (0.66) 24.7 1.53ⁱ (0.80) 37.3

Q7. The tone of discussions surrounding replication and reform in psychology (e.g., use of civil discourse vs. personal attacks) 1.20 (0.49) 16.8 1.31 (0.66) 23.0 1.73 (0.81) 52.5

Overall 1.65 (0.43) 1.82 (0.58) 2.01 (0.64)

Note. % discussed indicates the percentage of respondents who reported discussing each topic (briefly, in moderate depth, or extensively) in their course. N ranged from 277 to 279 for each introductory undergraduate item; n = 235 for the advanced undergraduate items, and n = 59 for the graduate items. ^r denotes items on the replication subscale. ⁱ denotes items on the interpretation subscale.

Participants then completed these same questions for each subsequent course they identified teaching in the past year (for a maximum of three times for those who identified a course at each level).

After the teaching items, participants were presented with a list of 31 issues and were asked to rate how much change they believe is needed in psychology research on these issues (1 = no change is needed, 2 = some small changes are needed, 3 = moderate changes are needed, 4 = significant changes are needed). The majority of issues were comprised of topics presented in the teaching questions, with slightly modified wording to make sense in the context of perceived change needed in research practice (e.g., “open access to study materials” from the teaching section was worded as “increasing open access to study materials” in the research section). See Table 2 to view the research items.
Table 2.
Perceived Need for Reform in Psychology Research

Methodology and Research Design M SD

Q4. Increasing the number of replication studies performed 3.10 0.87

Q21. Using more diverse samples from non-WEIRD (i.e., non-Western, education, industrialized, rich, democratic societies) societies 3.06 0.89

Q6. Increasing pre-registration (specifying sample size, materials, procedures, hypotheses, and plans for data analysis prior to data collection or analysis) 2.69 0.95

Q17. Increasing statistical power 2.53 0.88

Analysis, Interpretation, and Reporting

Q13. Increasing transparency in reporting (e.g., stating whether all conditions of a study, all data exclusions, and all dependent measures relevant to the research hypothesis were reported) 3.02 0.84

Q7. Increasing open access to study materials 2.99 0.86

Q14. Reducing p-hacking (performing multiple statistical tests of the same hypothesis and reporting only those that produce the desired result(s)) 2.84 0.91

Q29. Discussing scientific uncertainty when communicating research findings to the public 2.82 0.92

Q15. Explicitly identifying which hypotheses and analyses were a priori vs. post hoc/exploratory 2.76 0.88

Q8. Increasing open access to data files 2.75 0.88

Q30. Discussing alternative explanations for research findings 2.75 0.89

Q18. Reporting effect sizes for all statistical tests 2.70 0.88

Q19. Accurately discussing the strength of effects based on the observed findings 2.66 0.85

Q31. Discussing conflicting evidence in the literature (“debates” between various researchers) 2.64 0.90

Q20. Accurately reporting the generalizability of research findings (whether the results of a study apply in different contexts or populations than those in the original study) 2.61 0.90

Q16. Reporting how sample sizes were determined prior to (or after) beginning data analysis 2.47 0.87

Overall Practice and Climate

Q26. The pressure to publish in academia at the expense of conducting rigorous research and generating valid knowledge 3.35 0.83

Q27. Restructuring the incentive systems in academia to promote rigorous and transparent research 3.35 0.85

Q5. Providing dedicated journal space to publish the results of replication studies 3.21 0.88

Q22. Reducing publication bias (i.e., publication decisions based on the direction or significant of the findings; the “file-drawer problem”) 3.19 0.86

Q3. Increasing the replicability of research findings 3.08 0.84

Q11. Empirically testing the effectiveness of policies and practices implemented to improve the replicability of research findings 2.90 0.90

Q12. Increasing the civility of the tone of discussions surrounding replication and reform 2.73 0.96

Q1. Detecting and preventing data fabrication/falsification 2.50 0.73

Q28. Increasing communication about authorship decisions 2.45 0.95

Q23. Reducing political bias (the influence of political beliefs or values on research design and interpretation) 2.39 0.94

Q2. Detecting and preventing plagiarism 2.38 0.84

Q24. Reducing theoretical bias (the influence of theoretical preferences on research design and interpretation) 2.37 0.94

Q25. Increasing political diversity in academic psychology 2.28 0.94

Q9. Encouraging the use of badges (pre-registration, open data/materials) in publishing 2.21 0.90

Q10. Requiring the use of badges (pre-registration, open data/materials) in publishing 1.80 0.88

Overall 2.73 0.56

Note. N ranged from 322 to 328 for each item.

Lastly, participants were asked a number of questions about their teaching, research, position, and academic background. Participants reported their typical course teaching load/year; the number of years they have been teaching psychology at the college level; the class size for each course they identified above; the focus of their academic position; the percentage of their work time they spend on teaching, research, service, and other; their primary specialty/area of study; their current position; the number of undergraduate research assistants, undergraduate student independent research projects, and graduate student researchers advised per year; location of their college/university; most advanced degree and the year it was obtained. Following the work-related questions, participants completed standard demographic questions, reporting their gender, age, ethnicity, and race.

Results

Because many of the planned analyses involved multiple comparisons or testing multiple hypotheses, for all analyses, we set a more conservative threshold for significance (p < .01).
Q1A. To what extent are psychology instructors teaching topics related to replication and reform in introductory undergraduate, advanced undergraduate, and graduate psychology courses?

Descriptive statistics, factor structure, and internal consistency

Table 1 presents means and standard deviations for each teaching item, along with the percentage of respondents who reported discussing each topic at least briefly in their course. Frequencies for each item are depicted in Figures 1–3.
Figure 1.
Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their introductory-level undergraduate psychology course.

Figure 2.
Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their advanced-level undergraduate psychology course.

Figure 3.
Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their graduate-level psychology course.

Because these topics vary in difficulty and scope, they may be addressed differently in introductory undergraduate, advanced undergraduate, and graduate psychology courses. Therefore, a principal components analysis with a Varimax rotation was performed separately for each course level to assess whether the factor structure differed across the three course levels. For the introductory undergraduate teaching items, this analysis suggested that only one dominant factor was present (eigenvalue = 9.72, accounting for 34.71% of the variance), and thus the introductory undergraduate items were analysed as a composite measure (α = .92). The factor analysis for the advanced undergraduate items also suggested that only one dominant factor was present (eigenvalue = 13.05, accounting for 46.60% of the variance), and thus the advanced undergraduate teaching items were also analysed as a composite measure (α = .95). For the graduate teaching items, the analysis suggested two discrete factors were present (eigenvalues of 13.42 and 3.62, accounting for 47.93% and 12.94% of the variance, respectively). Following Stevens’ (1992) recommendations, items with factor loadings > .40 that did not cross-load were retained on each factor. This cut-off produced 17 items on factor 1 (α = .96) and 8 items on factor 2 (α = .90). One item was excluded because it cross-loaded on both factors, as were two other items that failed to load above the .40 threshold on either factor. Thus, in addition to the composite graduate teaching measure (containing all items; α = .96), we created two subscales by averaging the graduate teaching items on each factor. The first factor included items broadly related to replication, and the second included items broadly related to interpretation. For ease of presentation, we only include results for the replication and interpretation indices when they diverged from the overall pattern on the composite measure.

Level of discussion of issues in different courses

A one-way ANOVA was conducted to test the prediction that topics would be discussed more frequently in advanced courses (i.e., most in graduate courses, followed by advanced undergrad courses, and least in introductory undergrad courses), F(2, 570) = 14.23, p < .001, η²= .05. Overall, topics were more likely to be discussed in advanced undergraduate (M = 1.82, SD = 0.58) and graduate (M = 2.01, SD = 0.64) than introductory undergraduate (M = 1.65, SD = 0.43) courses (M_diff_{(intro vs. advanced)} = −0.17, SE = .05, p = .001, 95% CI = [−0.27, −0.06]; M_diff_{(intro vs. graduate)} = −0.36, SE = .07, p < .001, 95% CI = [−0.54, −0.18]). The difference in level of discussion between advanced undergraduate and graduate courses did not meet the .01 threshold set to control for multiple comparisons (M_diff_{(advanced vs. grad)} = −0.19, SE = .08, p = .03, 95% CI = [−0.37, −0.01]).

In examining the data, we noticed that, for each course level, participants identified different types of courses. We inductively generated categories of the types of courses participants listed. For introductory undergraduate courses, participants selected Introductory/General Psychology, Research Methods/Statistics, or an introductory content course (e.g., Developmental Psychology). For advanced undergraduate courses, participants selected an advanced content course or a Research Methods/Statistics/Other research-based course. For graduate courses, participants listed a content course, Internship, or Research Methods/Statistics/Writing course. We coded the type of course participants selected into these categories to test, as an exploratory question, whether instructors were more likely to teach topics related to replication and reform in Research Methods/Statistics courses than in other courses (see Table 3).²
Table 3.
Level of Discussion of Issues of Replication and Reform in Different Courses

n M SD

Introductory Undergraduate Course

Introductory Psychology 146 1.55 0.37

Introductory Content Course 95 1.64 0.38

Research Methods/Statistics 32 2.13 0.49

Advanced Undergraduate Course

Advanced Content Course 192 1.72 0.52

Research Methods/Statistics/Lab. or Research Course 39 2.21 0.65

Graduate Course

Content Course

Composite 41 1.86 0.65

Replication Subscale 41 1.69 0.61

Interpretation Subscale 41 2.21 0.76

Research Methods/Statistics/Writing

Composite 17 2.43 0.57

Replication Subscale 17 2.56 0.69

Interpretation Subscale 17 2.27 0.76

Note. Participants were instructed to identify one course they taught in the past year at each level. Seven participants listed 2 + introductory undergraduate courses (instead of one), and five participants listed 2+ advanced undergraduate courses. Because (a) only a few participants listed >1 course, (b) those who listed >1 course tended to list similar types of courses (e.g., Social Psychology and Cognitive Psychology; Introductory Psychology and Developmental Psychology), (c) the overall pattern of results did not differ if these participants were included or not, and (d) no exclusion criteria were set in advance, all participants were retained in the main analyses. However, these participants were excluded from the analyses comparing teaching in different types of courses (i.e., for the analyses presented in this table) because they listed courses in more than one category.

In introductory courses, instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics (M = 2.13, SD = 0.49) than in Introductory/General Psychology (M = 1.55, SD = 0.37) or introductory content courses (M = 1.64, SD = 0.38), F(2, 272) = 28.97, p < .001, η²= .18 (M_diff _{(methods/stats vs. intro psych.)} = 0.58, SE = .08, p < .001, 95% CI = [0.40, 0.76]; M_diff _{(methods/stats vs. intro content)} = 0.49, SE = .08, p < .001, 95% CI = [0.30, 0.67]). Similarly, in advanced undergraduate courses, instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics/lab courses (M = 2.21, SD = 0.65) than in content courses (M = 1.72, SD = 0.52), t(229) = 5.04, p < .001, d = 0.84 (M_diff_{(methods/stats vs. content)} = 0.48, SE = 0.10, 95% CI = [0.29, 0.67]). Overall, graduate instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics/Writing courses (M = 2.43, SD = 0.65) than in other courses (M = 1.86, SD = 0.57), t(56) = 3.35, p = .001, d = 0.93 (M_diff_{(methods/stats vs. other)} = 0.73, SE = 0.22, 95% CI = [0.28, 1.18]). However, only issues of replication were discussed more extensively in Methods/Statistics/Writing (M = 2.56, SD = 0.69) than other courses (M = 1.69, SD = 0.61), t(56) = 4.81, p < .001, d = 1.34 (M_diff_{(methods/stats vs. other)} = 1.06, SE = 0.25, 95% CI = [0.56, 1.56]); issues of interpretation were discussed about equally in Methods/Statistics/Writing (M = 2.27, SD = 0.76) and other courses (M = 2.21, SD = 0.76), t(56) = 0.29, p = .77, d = 0.08.
Q1B. Are certain types of psychology instructors teaching these topics more than others?

Descriptive statistics regarding the composition of instructors in our sample are provided in the Supplementary Materials. We found no differences in teaching of the topics based on instructor and class characteristics (e.g., number of years teaching, academic rank, specialty, teaching load, teaching vs. research focus, and class size), p’s > .08 (see Supplementary Materials). The only exceptions were for exploratory analyses comparing social/personality psychologists to others (for graduate courses), and examining teaching based on number of student researchers supervised per year.

In graduate courses, social/personality psychologists (M = 2.12, SD = 0.75) tended to discuss issues of replication more extensively than non-social/personality psychologists (M = 1.64, SD = 0.64), t(55) = 2.44, p = .02, d = 0.69, M_diff = 0.48, SE = 0.19, 95% CI = [.09, .88], whereas issues of interpretation were discussed about equally by social/personality (M = 2.21, SD = 0.75) and non-social/personality psychologists (M = 2.21, SD = 0.69), t(55) = −0.02, p = .99, d = .00, M_diff = 0.00, SE = 0.20, 95% CI = [−.40, .40].

Number of RAs, r(255) = .17, p = .005, and undergraduate student researchers, r(254) = .36, p < .001, supervised per year correlated with teaching topics in introductory undergraduate courses; number of graduate student researchers did not, r(255) = −.08, p = .21. Number of undergraduate student researchers correlated with teaching topics in advanced undergraduate courses, r(225) = .28, p < .001; number of RAs, r(225) = .06, p = .33, and graduate student researchers, r(226) = .08, p = .23, did not. Number of undergraduate researchers correlated with teaching issues of replication and reform in graduate courses, r(55) = .33, p = .01, and number of graduate researchers marginally did, r(54) = .26, p = .052 (replication: r(54) = .19, p = .15; interpretation: r(54) = .32, p = .02); number of RAs did not, r(55) = .22, p = .11.
Q2A. How much change do psychology instructors believe is needed in psychology research?

Q2B. Do certain types of psychology instructors believe more change is needed in psychology research than others?

As with the teaching items, a principal components analysis with Varimax rotation was performed to examine whether different factors were present in the data. This analysis suggested that only one dominant factor was present (eigenvalue = 12.50, accounting for 40.31% of the variance), and thus we analyzed the perceived need for reform items as a composite measure (α = .95). See Table 2 for perceived need for reform item means and standard deviations and Figure 4 for percentages of responses to each question.
Figure 4.
Percentage of participants who believe that no change, some small changes, moderate changes, or significant changes are needed in psychology research on each issue.

We found no differences in perceived need for reform based on instructor and class characteristics (e.g., number of years teaching, academic rank, specialty, teaching load, teaching vs. research focus, and class size), p’s ≥ .13 (see Supplementary Materials). Exploratory analyses revealed a small correlation between number of undergraduate student researchers supervised per year and perceived need for reform, r(322) = .13, p = .02, but no relationship between perceived need for reform and number of RAs, r(323) = .02, p = .70, or graduate students, r(323) = −.07, p = .18.
Q3. Are psychology instructors more likely to teach about replication and reform if they believe more change is needed in psychology research?

Perceived need for reform was not significantly related to teaching issues of replication and reform in introductory undergraduate courses, r(255) = .10, p = .11, but did correlate with teaching these issues in advanced undergraduate, r(228) = .30, p < .001, and graduate courses, r(56) = .44, p = .001 (replication: r(56) = .46, p < .001; interpretation: r(56) = .26, p = .052).³

Discussion

This study presents novel measures of the teaching and perceptions of issues of replication and reform, and provides preliminary evidence of their internal consistency and validity. Across course levels, most participants reported discussing the issues of replication and reform raised on our survey briefly (or not at all) in their courses. Items on our measures—assessing a broad range of issues in research practice—were strongly intercorrelated, indicating that discussion of the various topics tended to co-occur. However, different factors emerged for graduate courses, suggesting that, in graduate courses, issues of replication may be discussed together and issues of interpretation may be discussed together. In undergraduate courses, both types of issues may be addressed by instructors seeking to teach critical scientific thinking skills, whereas graduate instructors may vary their focus on each type of issue based on the course content and goals. Indeed, graduate instructors were more likely to teach issues of replication in methods/statistics than content courses, whereas issues of interpretation were discussed to a similar extent in both types of courses. This difference in factor structure for undergraduate and graduate courses should be further explored in future research employing a larger sample of graduate instructors.

Overall, the topics were discussed in more depth in upper-level courses and in methods/statistics as opposed to content courses, demonstrating known-groups validity. Issues of replication were more likely to be discussed in graduate courses by social/personality psychologists than by non-social/personality psychologists, whereas issues of interpretation were discussed equally. Due to the small number of graduate instructors in our sample, this finding remains speculative. Even so, of all of the areas of psychology, social and personality psychology has likely received the most attention in discussions of the replication “crisis,” and thus social/personality psychologists may feel especially compelled to address issues of replication in their courses. Given that the present study oversampled social and personality psychologists, it is possible that issues of replication are discussed even less frequently across all psychology courses.

Although topics were more likely to be discussed in advanced (undergraduate and graduate) than introductory courses, a few items diverged from this trend. Notably, plagiarism was discussed more extensively in undergraduate than graduate courses, and generalizability and scientific uncertainty were discussed about equally in undergraduate and graduate courses. In fact, of all the items on the survey, generalizability was the most discussed topic across the three course levels. Alternative explanations for research findings, conflicting evidence in the literature, and the importance of replication were also commonly discussed, whereas political homogeneity, authorship decisions, the tone of discussions surrounding replication and reform, pre-registration, open data and materials, p-hacking, HARKing, and political bias were discussed least.

Across course levels, there were no differences in level of discussion of issues related to replication and reform based on number of years teaching, academic rank, teaching load, teaching vs. research focus, class size, or specialty. Given the relatively small sample collected (which oversampled social/personality psychologists and instructors in the United States), we refrain from drawing strong conclusions regarding whether these factors correlate with teaching issues of replication and reform in the larger population of academic psychologists. In particular, the small number of participants at each rank and in each specialty limited our ability to make comparisons on these variables.

Interestingly, however, one individual difference factor did correlate with teaching of issues of replication and reform: number of student researchers supervised per year. Number of undergraduate student researchers supervised per year correlated with teaching issues of replication and reform in introductory undergraduate, advanced undergraduate, and graduate courses, and number of graduate student researchers supervised per year marginally correlated with teaching these topics in graduate courses. These findings suggest that those involved in shaping next generation of psychology researchers may be most likely to teach about current issues and debates in psychology research.

Perceived Need for Reform in Psychology Research

In recent research, social/personality psychologists perceived the replicability of findings in their discipline to be low but slightly improving as a result of the replication and reform movement (Motyl et al., 2017). The results of our survey extend these findings, suggesting that psychologists believe moderate changes are still needed in psychology research on many issues. Indeed, our perceived need for reform items were strongly intercorrelated, suggesting that, currently, psychologists hold similar perceptions of the need for further reform on a range of research practices. Participants reported the most significant changes are needed in reducing the pressure to publish in academia and in restructuring the incentive systems in academia to promote rigorous and transparent research.

Providing preliminary evidence of the convergent and discriminant validity of our measures, perceived need for reform moderately correlated with teaching issues of replication and reform in advanced undergraduate and graduate (but not introductory) undergraduate courses. These results suggest that those who believe more reform is needed in psychology research are more likely to advocate for changes by teaching about these issues in upper-level courses, but other factors also influence psychologists’ decisions about whether to teach these issues. Indeed, course content may be heavily dependent on the amount of material that needs to get covered in each course, major requirements and departmental decisions about course curricula, and attempts to standardize courses across instructors.

Limitations, Implications, and Future Directions

The present study offers novel measures of the teaching and perceptions of issues of replication and reform to be further refined and applied in future research (e.g., in studies containing more representative samples, tracking teaching and perceived need for reform over time and across disciplines, comparing researcher attitudes to student and public opinion, examining the impact/consequences of teaching and perceptions of these issues, etc.). Although this study provides preliminary evidence of the internal consistency and validity of these measures, these questionnaires should be further validated and possibly reduced in length (due to their high reliability; DeVellis, 2003) before they are considered established instruments or used on a large scale.

Furthermore, the current findings may not generalize across the larger population of academic psychologists. Although the present sample was similar in size to other surveys of academic psychologists advertised through society listservs (e.g., Inbar & Lammers, 2012), this study might have contained a selection bias in those who were willing to participate (e.g., those in more teaching-focused positions, as the recruitment email described the study as a survey on the teaching of undergraduate and graduate psychology). Certainly, because we only had access to certain listservs, the present study did not representatively sample all psychologists.

Many participants in our sample primarily teach undergraduates, and including a discussion of these topics in every course would not be feasible or desired. It is unclear from the present findings whether most psychology majors get exposure to these issues at some point in their psychology education, as participants only reported about one course they had taught in the past year at each course level. The present findings indicate that instructors are more likely to incorporate a discussion of these issues in more advanced, research-focused courses, perhaps to allow students to first develop a foundation in psychology (and given their relevance to research design and practice). In addition, graduate students may discuss these issues quite extensively with their peers, professors, and advisers in contexts other than the classroom (e.g., in informal discussions, in collaborating on research projects, or at talks, seminars, and conferences). Nonetheless, it may be useful for future research to examine the impact of exposing students to issues of replicability even if they do not pursue advanced research training (e.g., helping them become critical consumers of science in their daily lives).

Finally, our research can serve as further establishment of a baseline for how the field-wide challenges are being addressed in the classroom. Over time, we anticipate this research continuing, documenting how psychology instructors teaching practices change (and do not change) over time. We also believe that these materials could be adapted to track how other fields experiencing a crisis of confidence adapt their teaching techniques (such as cancer research).

Conclusion

Psychology has been at the centre of discussions of problems in research practice and at the forefront of developing solutions. To date, however, discussions have largely focused on reforming research practice (but see Chopik et al., 2018; Frank & Saxe, 2012; Funder et al., 2014), and perceptions of how much reform is still needed on a range of research practices have been largely unknown. The results of our survey suggest that psychology professors (a) may only briefly discuss issues of replication and reform in their courses but give more attention to these issues in methods/statistics and upper-level courses, (b) are more likely to discuss these issues if they believe more reform is needed in psychology research and supervise student researchers, and (c) overall still believe more changes are needed in many psychology research practices. Additional research is needed to further understand knowledge and perceptions of problems in scientific research (e.g., across disciplines and over time) and the impact of teaching these issues in the classroom.

Supplemental Material

Supplemental material for Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication, Interpretation, and Transparency

Supplemental Material for Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication, Interpretation, and Transparency by Stephanie M. Anglin and John E. Edlund in Psychology Learning & Teaching

	Intro. Undergrad.	Advanced Undergrad.	Graduate
Q3. The importance of replication as a research methodology	2.31 (0.82)	85.7	2.33 (0.90)	80.9	2.42^r (0.97)	81.4
Q19. Use of WEIRD samples (drawn from Western, education, industrialized, rich, democratic societies)	1.95 (0.89)	63.4	2.22 (1.00)	71.5	2.25ⁱ (1.03)	69.5
Q15. Statistical power	1.50 (0.77)	36.2	1.71 (0.90)	48.1	2.02^r (1.04)	57.6
Q14. How researchers determine their sample size	1.50 (0.75)	36.2	1.57 (0.83)	38.7	2.00^r (1.02)	57.6
Q8. Pre-registration (specifying sample size, materials, procedures, hypotheses, and plans for data analysis prior to data collection or analysis)	1.40 (0.69)	29.7	1.52 (0.80)	35.7	1.68^r (0.86)	47.5
Analysis, Interpretation, and Reporting
Q18. Generalizability (whether the results of a study apply in different contexts or populations than those in the original study)	2.61 (0.86)	90.3	2.80 (0.91)	93.2	2.61ⁱ (0.95)	86.4
Q27. Alternative explanations for research findings (a different explanation than offered by the researchers)	2.30 (0.93)	78.8	2.55 (1.00)	81.3	2.59ⁱ (1.04)	83.1
Q28. Conflicting evidence in the literature (“debates” between various researchers)	2.28 (0.92)	77.3	2.53 (0.93)	85.1	2.64ⁱ (1.06)	83.1
Q2. Plagiarism	2.24 (0.93)	78.9	2.37 (0.97)	80.0	2.00 (0.87)	69.5
Q26. The concept of scientific uncertainty	2.03 (0.93)	66.8	2.15 (1.01)	68.9	2.10 (1.03)	62.7
Q17. Overclaiming (i.e., overstating or exaggerating research findings)	1.76 (0.79)	56.6	1.98 (0.98)	60.4	2.22ⁱ (0.98)	74.6
Q16. Effect size	1.63 (0.85)	43.4	1.89 (1.00)	54.5	2.22^r (0.98)	64.4
Q20. Publication bias (i.e., publication decisions based on the direction or significant of the findings; the “file-drawer problem”)	1.62 (0.78)	45.9	1.85 (0.98)	51.9	2.19^r (0.94)	74.6
Q22. Theoretical bias (the influence of theoretical preferences on research design and interpretation)	1.65 (0.75)	49.5	1.89 (0.96)	55.7	2.05ⁱ (1.01)	62.7
Q1. Data fabrication/falsification	1.65 (0.73)	51.3	1.77 (0.84)	54.5	1.80^r (1.91)	52.5
Q11. Selective reporting (e.g., failing to report all conditions of a study, all data exclusions, or all dependent measures relevant to the research hypothesis)	1.46 (0.70)	35.1	1.74 (0.89)	49.4	2.08^r (0.97)	66.1
Q21. Political bias (the influence of political beliefs or values on research design and interpretation)	1.47 (0.66)	38.0	1.57 (0.81)	40.4	1.83ⁱ (0.93)	54.2
Q13. Reporting post hoc explanations for research findings as predicted, a priori hypotheses (i.e., HARKing or “Hypothesizing after the results are known”)	1.28 (0.57)	22.9	1.51 (0.85)	32.3	1.80^r (0.96)	47.5
Q12. P-hacking (performing multiple statistical tests of the same hypothesis and reporting only those that produce the desired result(s))	1.24 (0.57)	17.9	1.53 (0.91)	31.5	1.83^r (0.97)	49.2
Q9. Open access to study materials	1.33 (0.66)	22.9	1.34 (0.66)	25.1	1.63^r (0.83)	44.1
Q10. Open access to data files	1.26 (0.57)	20.4	1.31 (0.64)	23.4	1.64^r (0.76)	49.2
Overall Practice and Climate
Q4. How commonly (or uncommonly) replication studies are performed in psychology	1.76 (0.77)	57.3	1.86 (0.91)	57.0	2.05^r (0.94)	67.8
Q24. The pressure to publish in academia	1.53 (0.71)	41.6	1.72 (0.90)	48.5	2.02^r (1.03)	59.3
Q5. Recent publicized failures to replicate studies in psychology (e.g., Nosek et al., 2015)	1.45 (0.67)	35.8	1.66 (0.87)	44.3	1.92^r (0.92)	62.7
Q6. Current debates on the problem of replication in psychology (e.g., Gilbert et al., 2016)	1.44 (0.65)	35.5	1.63 (0.86)	43.0	1.98^r (0.92)	62.7
Q25. Authorship and authorship order decisions	1.25 (0.55)	20.4	1.32 (0.66)	23.0	1.54^r (0.84)	35.6
Q23. Political homogeneity in academic psychology	1.24 (0.54)	19.4	1.33 (0.66)	24.7	1.53ⁱ (0.80)	37.3
Q7. The tone of discussions surrounding replication and reform in psychology (e.g., use of civil discourse vs. personal attacks)	1.20 (0.49)	16.8	1.31 (0.66)	23.0	1.73 (0.81)	52.5
Overall	1.65 (0.43)		1.82 (0.58)		2.01 (0.64)

Methodology and Research Design	M	SD
Q4. Increasing the number of replication studies performed	3.10	0.87
Q21. Using more diverse samples from non-WEIRD (i.e., non-Western, education, industrialized, rich, democratic societies) societies	3.06	0.89
Q6. Increasing pre-registration (specifying sample size, materials, procedures, hypotheses, and plans for data analysis prior to data collection or analysis)	2.69	0.95
Q17. Increasing statistical power	2.53	0.88
Analysis, Interpretation, and Reporting
Q13. Increasing transparency in reporting (e.g., stating whether all conditions of a study, all data exclusions, and all dependent measures relevant to the research hypothesis were reported)	3.02	0.84
Q7. Increasing open access to study materials	2.99	0.86
Q14. Reducing p-hacking (performing multiple statistical tests of the same hypothesis and reporting only those that produce the desired result(s))	2.84	0.91
Q29. Discussing scientific uncertainty when communicating research findings to the public	2.82	0.92
Q15. Explicitly identifying which hypotheses and analyses were a priori vs. post hoc/exploratory	2.76	0.88
Q8. Increasing open access to data files	2.75	0.88
Q30. Discussing alternative explanations for research findings	2.75	0.89
Q18. Reporting effect sizes for all statistical tests	2.70	0.88
Q19. Accurately discussing the strength of effects based on the observed findings	2.66	0.85
Q31. Discussing conflicting evidence in the literature (“debates” between various researchers)	2.64	0.90
Q20. Accurately reporting the generalizability of research findings (whether the results of a study apply in different contexts or populations than those in the original study)	2.61	0.90
Q16. Reporting how sample sizes were determined prior to (or after) beginning data analysis	2.47	0.87
Overall Practice and Climate
Q26. The pressure to publish in academia at the expense of conducting rigorous research and generating valid knowledge	3.35	0.83
Q27. Restructuring the incentive systems in academia to promote rigorous and transparent research	3.35	0.85
Q5. Providing dedicated journal space to publish the results of replication studies	3.21	0.88
Q22. Reducing publication bias (i.e., publication decisions based on the direction or significant of the findings; the “file-drawer problem”)	3.19	0.86
Q3. Increasing the replicability of research findings	3.08	0.84
Q11. Empirically testing the effectiveness of policies and practices implemented to improve the replicability of research findings	2.90	0.90
Q12. Increasing the civility of the tone of discussions surrounding replication and reform	2.73	0.96
Q1. Detecting and preventing data fabrication/falsification	2.50	0.73
Q28. Increasing communication about authorship decisions	2.45	0.95
Q23. Reducing political bias (the influence of political beliefs or values on research design and interpretation)	2.39	0.94
Q2. Detecting and preventing plagiarism	2.38	0.84
Q24. Reducing theoretical bias (the influence of theoretical preferences on research design and interpretation)	2.37	0.94
Q25. Increasing political diversity in academic psychology	2.28	0.94
Q9. Encouraging the use of badges (pre-registration, open data/materials) in publishing	2.21	0.90
Q10. Requiring the use of badges (pre-registration, open data/materials) in publishing	1.80	0.88
Overall	2.73	0.56

	n	M	SD
Introductory Undergraduate Course
Introductory Psychology	146	1.55	0.37
Introductory Content Course	95	1.64	0.38
Research Methods/Statistics	32	2.13	0.49
Advanced Undergraduate Course
Advanced Content Course	192	1.72	0.52
Research Methods/Statistics/Lab. or Research Course	39	2.21	0.65
Graduate Course
Content Course
Composite	41	1.86	0.65
Replication Subscale	41	1.69	0.61
Interpretation Subscale	41	2.21	0.76
Research Methods/Statistics/Writing
Composite	17	2.43	0.57
Replication Subscale	17	2.56	0.69
Interpretation Subscale	17	2.27	0.76

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

Notes

Author biographies

Stephanie M. Anglin is an assistant professor of psychology at Hobart and William Smith Colleges. Her research and teaching interests include scientific reasoning and communication, beliefs and attitudes, research methods, and scientific best practices.

John E. Edlund is associate professor of psychology at the Rochester Institute of Technology and serves as the research director of Psi Chi: The International Honor Society in Psychology. He has won numerous awards related to teaching and is passionate about the dissemination of psychological knowledge to the world. His research interests span numerous content domains ranging from research methods research, to evolutionary psychology, to social psychology.

References

Chopik

W. J.

Bremner

R. H.

Defever

A. M.

Keller

V. N.

(2018) How (and whether) to teach undergraduates about the replication crisis in psychological science. Teaching of Psychology 45: 158–163.

DeVellis

R. F.

(2003) Scale Development: Theory and Applications. (2nd edn.), Thousand Oaks, CA: Sage.

Dweck

C. S.

(2017) Is psychology headed in the right direction? Yes, no, and maybe. Perspectives on Psychological Science 12: 656–659.

Fanelli

(2010) Do pressures to publish increase scientists’ bias? An empirical support from US states data. PLoS One 5: e10271.

Ferguson

C. J.

Brown

J. M.

Torres

A. V.

(2016) Education or indoctrination? The accuracy of introductory psychology textbooks in covering controversial topics and urban legends about psychology. Current Psychology 37: 1–9.

Fiedler

Schwarz

(2016) Questionable research practices revisited. Social Psychological and Personality Science 7: 45–52.

Franco

Malhotra

Simonovits

(2014) Publication bias in the social sciences: Unlocking the file drawer. Science 345: 1502–1505.

Frank

M. C.

Saxe

(2012) Teaching replication. Perspectives on Psychological Science 7: 600–604.

Funder

D. C.

Levine

J. M.

Mackie

D. M.

Morf

C. C.

Sansone

Vazire

West

S. G.

(2014) Improving the dependability of research in personality and social psychology: Recommendations for research and educational practice. Personality and Social Psychology Review 18: 3–12.

10.

Gilbert

D. T.

King

Pettigrew

Wilson

T. D.

(2016) Comment on “Estimating the reproducibility of psychological science.”. Science 351: 1037-a.

11.

Inbar

Lammers

(2012) Political diversity in social and personality psychology. Perspectives on Psychological Science 7: 496–503.

12.

Kerr

N. L.

(1998) HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review 2: 196–217.

13.

Kidwell

M. C.

Lazarević

L. B.

Baranski

Hardwicke

T. E.

Piechowski

Falkenberg

Nosek

B. A.

(2016) Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology 14: e1002456.

14.

Motyl

Demos

A. P.

Carsel

T. S.

Hanson

B. E.

Melton

Z. J.

Mueller

A. B.

Yantis

(2017) The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse? Journal of Personality and Social Psychology 113: 34.

15.

Nosek B. A., Aarts A. A., Anderson J. E., Anderson C. J., Attridge P. R., Attwood, A., … Zuni, K. (2015). Estimating the reproducibility of psychological science. Science, 349. doi:10.1126/science.aac4716.

16.

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615–631.

17.

O’Boyle

E. H.

Jr Banks

G. C.

Gonzalez-Mule

(2017) The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management 43: 376–399.

18.

Rutjens, B. T., Heine, S. J., Sutton, R., & van Harreveld, F. (2017). Attitudes towards science. Advances in Experimental Social Psychology, 57, 2.

19.

Sternberg, R. J. (2017). Mountain climbing in the dark: Introduction to the special symposium on the future direction of psychological science. Perspectives on Psychological Science, 1, 649–651.

20.

Stevens

J. P.

(1992) Applied Multivariate Statistics for the Social Sciences. (2nd edn.), Hillsdale, NJ: Erlbaum.

21.

Stroebe

Strack

(2014) The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science 9: 59–71.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.37 MB

	Intro. Undergrad.		Advanced Undergrad.		Graduate
Methodology and Research Design	M (SD)	% Discussed	M (SD)	% Discussed	M (SD)	% Discussed
Q3. The importance of replication as a research methodology	2.31 (0.82)	85.7	2.33 (0.90)	80.9	2.42^r (0.97)	81.4
Q19. Use of WEIRD samples (drawn from Western, education, industrialized, rich, democratic societies)	1.95 (0.89)	63.4	2.22 (1.00)	71.5	2.25ⁱ (1.03)	69.5
Q15. Statistical power	1.50 (0.77)	36.2	1.71 (0.90)	48.1	2.02^r (1.04)	57.6
Q14. How researchers determine their sample size	1.50 (0.75)	36.2	1.57 (0.83)	38.7	2.00^r (1.02)	57.6
Q8. Pre-registration (specifying sample size, materials, procedures, hypotheses, and plans for data analysis prior to data collection or analysis)	1.40 (0.69)	29.7	1.52 (0.80)	35.7	1.68^r (0.86)	47.5
Analysis, Interpretation, and Reporting
Q18. Generalizability (whether the results of a study apply in different contexts or populations than those in the original study)	2.61 (0.86)	90.3	2.80 (0.91)	93.2	2.61ⁱ (0.95)	86.4
Q27. Alternative explanations for research findings (a different explanation than offered by the researchers)	2.30 (0.93)	78.8	2.55 (1.00)	81.3	2.59ⁱ (1.04)	83.1
Q28. Conflicting evidence in the literature (“debates” between various researchers)	2.28 (0.92)	77.3	2.53 (0.93)	85.1	2.64ⁱ (1.06)	83.1
Q2. Plagiarism	2.24 (0.93)	78.9	2.37 (0.97)	80.0	2.00 (0.87)	69.5
Q26. The concept of scientific uncertainty	2.03 (0.93)	66.8	2.15 (1.01)	68.9	2.10 (1.03)	62.7
Q17. Overclaiming (i.e., overstating or exaggerating research findings)	1.76 (0.79)	56.6	1.98 (0.98)	60.4	2.22ⁱ (0.98)	74.6
Q16. Effect size	1.63 (0.85)	43.4	1.89 (1.00)	54.5	2.22^r (0.98)	64.4
Q20. Publication bias (i.e., publication decisions based on the direction or significant of the findings; the “file-drawer problem”)	1.62 (0.78)	45.9	1.85 (0.98)	51.9	2.19^r (0.94)	74.6
Q22. Theoretical bias (the influence of theoretical preferences on research design and interpretation)	1.65 (0.75)	49.5	1.89 (0.96)	55.7	2.05ⁱ (1.01)	62.7
Q1. Data fabrication/falsification	1.65 (0.73)	51.3	1.77 (0.84)	54.5	1.80^r (1.91)	52.5
Q11. Selective reporting (e.g., failing to report all conditions of a study, all data exclusions, or all dependent measures relevant to the research hypothesis)	1.46 (0.70)	35.1	1.74 (0.89)	49.4	2.08^r (0.97)	66.1
Q21. Political bias (the influence of political beliefs or values on research design and interpretation)	1.47 (0.66)	38.0	1.57 (0.81)	40.4	1.83ⁱ (0.93)	54.2
Q13. Reporting post hoc explanations for research findings as predicted, a priori hypotheses (i.e., HARKing or “Hypothesizing after the results are known”)	1.28 (0.57)	22.9	1.51 (0.85)	32.3	1.80^r (0.96)	47.5
Q12. P-hacking (performing multiple statistical tests of the same hypothesis and reporting only those that produce the desired result(s))	1.24 (0.57)	17.9	1.53 (0.91)	31.5	1.83^r (0.97)	49.2
Q9. Open access to study materials	1.33 (0.66)	22.9	1.34 (0.66)	25.1	1.63^r (0.83)	44.1
Q10. Open access to data files	1.26 (0.57)	20.4	1.31 (0.64)	23.4	1.64^r (0.76)	49.2
Overall Practice and Climate
Q4. How commonly (or uncommonly) replication studies are performed in psychology	1.76 (0.77)	57.3	1.86 (0.91)	57.0	2.05^r (0.94)	67.8
Q24. The pressure to publish in academia	1.53 (0.71)	41.6	1.72 (0.90)	48.5	2.02^r (1.03)	59.3
Q5. Recent publicized failures to replicate studies in psychology (e.g., Nosek et al., 2015)	1.45 (0.67)	35.8	1.66 (0.87)	44.3	1.92^r (0.92)	62.7
Q6. Current debates on the problem of replication in psychology (e.g., Gilbert et al., 2016)	1.44 (0.65)	35.5	1.63 (0.86)	43.0	1.98^r (0.92)	62.7
Q25. Authorship and authorship order decisions	1.25 (0.55)	20.4	1.32 (0.66)	23.0	1.54^r (0.84)	35.6
Q23. Political homogeneity in academic psychology	1.24 (0.54)	19.4	1.33 (0.66)	24.7	1.53ⁱ (0.80)	37.3
Q7. The tone of discussions surrounding replication and reform in psychology (e.g., use of civil discourse vs. personal attacks)	1.20 (0.49)	16.8	1.31 (0.66)	23.0	1.73 (0.81)	52.5
Overall	1.65 (0.43)		1.82 (0.58)		2.01 (0.64)

Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication,Interpretation,and Transparency

Abstract

Keywords

Study Overview and Hypotheses

Method

Participants

Materials and Procedure

Results

Descriptive statistics, factor structure, and internal consistency

Level of discussion of issues in different courses

Discussion

Perceived Need for Reform in Psychology Research

Limitations, Implications, and Future Directions

Conclusion

Supplemental Material

Supplemental material for Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication, Interpretation, and Transparency

Footnotes

Declaration of Conflicting Interests

Funding

Supplemental Material

Notes

Author biographies

References

Supplementary Material