Abstract
Despite discussion and institution of new reforms in psychology research, little is known about how much reform psychologists believe is still needed across various research practices and whether instructors are teaching students about replication and reform in their courses. To investigate these questions, we distributed questionnaires assessing perceived need for reform in psychology research and the teaching of replication and reform to instructors of undergraduate and graduate psychology courses across multiple listservs (n = 328). Participants reported discussing topics related to replication and reform briefly in their courses and that moderate changes are still needed in psychology research. Topics were discussed more extensively in advanced vs. introductory courses, and in methods/statistics vs. content courses. Perceived need for reform and number of student researchers supervised/year correlated with teaching these issues, suggesting that those who believe more change is needed in psychology research and are more involved in shaping the next generation of psychology researchers are more likely to discuss replication and reform in their courses. Our questionnaires provide a preliminary tool to be further refined, validated, and applied in future research on knowledge and perceptions of problems in social science research and the impact of teaching these issues in the classroom.
Over the past decade, highly publicized cases of research misconduct and failed replications have raised concerns about the integrity of findings reported in the scientific literature (e.g., Nosek, et al., 2015). The pressure to publish (e.g., Fanelli, 2010), preference for positive (e.g., Franco, Malhotra, & Simonovits, 2014), perfect-looking findings and novel, compelling narratives (e.g., O’Boyle, Banks, & Gonzalez-Mule, 2017), and incentive systems that reward publishing over truth-telling, accuracy, and dependability (e.g., Nosek, Spies, & Motyl, 2012) threaten the integrity of the scientific enterprise, encouraging behaviours such as p-hacking, selective reporting, publication bias, and HARKing (Kerr, 1998), while discouraging replication, high-powered studies, and the use of diverse paradigms, stimuli, and samples.
Recently, psychology researchers have begun developing more rigorous policies and standards to improve scientific practice. Many journals have adopted new reporting policies, requiring a discussion of power and how sample size was determined, full disclosure statements, and effect sizes and exact p-values for all analyses. Some journals have begun awarding badges for exceptional transparency, including pre-registration and public posting of study data and materials (Kidwell et al., 2016; osf.io/tvyxz). Journals are also increasingly welcoming the submission of replication studies.
Despite discussion and institution of new reforms, some believe the reproducibility “crisis” is overblown (e.g., Stroebe & Strack, 2014), the replicability of findings in the psychological literature is quite high (Gilbert, King, Pettigrew, & Wilson, 2016), questionable research practices are less prevalent than originally estimated (Fiedler & Schwarz, 2016), particular responses and reforms have been unnecessary or disadvantageous (e.g., Stroebe & Strack, 2014), and the reform movement has produced a hostile environment where researchers have been, or fear becoming, personally attacked (e.g., Dweck, 2017). Researchers have also expressed concern about the signal a “crisis” sends to the public and potential funders (e.g., Rutjens, Heine, Sutton, & van Harreveld, 2017). However, outspoken critics and supporters’ views are generally the ones heard, leaving an open question as to what the majority of researchers believe (Sternberg, 2017). A 2015 survey revealed that, overall, social/personality psychologists perceive the replicability of studies in their field to be low, but slightly more replicable than 10 years ago, and that research practices have improved as a result of the replication and reform movement (Motyl et al., 2017). However, little is known about how much reform psychology researchers believe is still needed in psychology research and which practices they believe still need reform.
Moreover, discussion of science reform has largely focused on revising current research practice, rather than on reforming psychology curriculum, education, and training. However, with new policies and guidelines emerging, students entering the field will not only need to learn them but will have the opportunity to refine and improve them in the future. Some efforts have been made to consider the accuracy of research described in textbooks (e.g., Ferguson, Brown, & Torres, 2016), redesign methods courses to teach replication as a core component of research practice (Frank & Saxe, 2012), and develop pedagogical tools to communicate issues of replication and best practices to students (Chopik, Bremner, Defever, & Keller, 2018). In general, however, it is unknown what instructors are teaching students about current issues and debates in psychology research and how instructors’ own perceptions of the replication and reform movement influence their teaching.
Study Overview and Hypotheses
In this study, we developed questionnaires to (a) provide insight into academic psychologists’ perceptions of how much reform is still needed in psychology research and their teaching of replication, interpretation, and transparency; and (b) offer a preliminary tool to be refined, validated, and applied in future research further investigating these issues and their consequences. Using these measures, we tested the following questions and hypotheses:
No specific prediction was made about the extent to which psychology instructors are teaching topics related to replication and reform at each course level. However, given the complexity and advanced nature of many of the topics assessed, we expected that topics would be discussed more frequently in more advanced courses (i.e., most in graduate courses, followed by advanced undergrad courses, and least in intro undergrad courses).
We investigated whether discussion of issues related to replication and reform varied based on length of time teaching, teaching load and focus, class size, and specialty in psychology.
Although no prediction was made about how much change psychology instructors overall believe is needed in psychology research, we tested several competing hypotheses about whether perceived need for change varied based on length of time teaching, teaching load and focus, and specialty in psychology.
We predicted that psychology instructors would be more likely to teach about issues of replication and reform if they perceive greater changes to be needed in psychology research practice.
Method
Participants
A total of 402 participants provided informed consent to participate in this study. However, four participants did not proceed past the consent form, and one withdrew consent midway through the survey (and was removed from the data set). Of the remaining 397 participants, 328 completed the survey (although a few did not answer all demographic items).
Of the 314–315 participants who answered each demographic item, 227 identified as female, 86 as male, and one as other; nine as Hispanic or Latino and 305 as Not Hispanic or Latino; two as American Indian or Alaska Native, 14 as Asian, seven as Black or African American, 287 as White, and eight as Other (six participants selected two options for race). The majority of participants indicated teaching at institutions in the United States (n = 291).
Materials and Procedure
We aimed to collect a large sample of psychology instructors by recruiting participants from major psychology societies with member listservs: the Society for Personality and Social Psychology, Society for the Teaching of Psychology, and Society for the Psychological Study of Social Issues (to our knowledge, these are the only psychological societies with member accessible listservs). In April 2017, we sent an email to the listservs for these societies, inviting all who had taught at least one psychology course in the past year at the college level to participate in a survey on the teaching of undergraduate and graduate psychology. Approximately two weeks after the initial email, we sent a reminder, specifying the closing date for the survey (two weeks following the reminder). As compensation, participants had the opportunity to enter their name in a raffle to win one of three $100 Amazon gift cards.
After providing consent, participants were asked whether they taught a psychology course in the past year at the introductory undergraduate (e.g., 100–200 level), advanced undergraduate (e.g., 300–400 level), and graduate level. For each course level participants indicated teaching, they were asked to select one course they taught in the past year at that level and enter its title.
Mean Level of Discussion of Each Topic in Introductory Undergraduate, Advanced Undergraduate, and Graduate Psychology Courses
Note. % discussed indicates the percentage of respondents who reported discussing each topic (briefly, in moderate depth, or extensively) in their course. N ranged from 277 to 279 for each introductory undergraduate item; n = 235 for the advanced undergraduate items, and n = 59 for the graduate items. r denotes items on the replication subscale. i denotes items on the interpretation subscale.
Participants then completed these same questions for each subsequent course they identified teaching in the past year (for a maximum of three times for those who identified a course at each level).
Perceived Need for Reform in Psychology Research
Note. N ranged from 322 to 328 for each item.
Lastly, participants were asked a number of questions about their teaching, research, position, and academic background. Participants reported their typical course teaching load/year; the number of years they have been teaching psychology at the college level; the class size for each course they identified above; the focus of their academic position; the percentage of their work time they spend on teaching, research, service, and other; their primary specialty/area of study; their current position; the number of undergraduate research assistants, undergraduate student independent research projects, and graduate student researchers advised per year; location of their college/university; most advanced degree and the year it was obtained. Following the work-related questions, participants completed standard demographic questions, reporting their gender, age, ethnicity, and race.
Results
Because many of the planned analyses involved multiple comparisons or testing multiple hypotheses, for all analyses, we set a more conservative threshold for significance (p < .01).
Descriptive statistics, factor structure, and internal consistency
Table 1 presents means and standard deviations for each teaching item, along with the percentage of respondents who reported discussing each topic at least briefly in their course. Frequencies for each item are depicted in Figures 1–3.
Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their introductory-level undergraduate psychology course. Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their advanced-level undergraduate psychology course. Percentage of instructors who did not discuss, discussed briefly, discussed in moderate depth, and discussed extensively each topic in their graduate-level psychology course.


Because these topics vary in difficulty and scope, they may be addressed differently in introductory undergraduate, advanced undergraduate, and graduate psychology courses. Therefore, a principal components analysis with a Varimax rotation was performed separately for each course level to assess whether the factor structure differed across the three course levels. For the introductory undergraduate teaching items, this analysis suggested that only one dominant factor was present (eigenvalue = 9.72, accounting for 34.71% of the variance), and thus the introductory undergraduate items were analysed as a composite measure (α = .92). The factor analysis for the advanced undergraduate items also suggested that only one dominant factor was present (eigenvalue = 13.05, accounting for 46.60% of the variance), and thus the advanced undergraduate teaching items were also analysed as a composite measure (α = .95). For the graduate teaching items, the analysis suggested two discrete factors were present (eigenvalues of 13.42 and 3.62, accounting for 47.93% and 12.94% of the variance, respectively). Following Stevens’ (1992) recommendations, items with factor loadings > .40 that did not cross-load were retained on each factor. This cut-off produced 17 items on factor 1 (α = .96) and 8 items on factor 2 (α = .90). One item was excluded because it cross-loaded on both factors, as were two other items that failed to load above the .40 threshold on either factor. Thus, in addition to the composite graduate teaching measure (containing all items; α = .96), we created two subscales by averaging the graduate teaching items on each factor. The first factor included items broadly related to replication, and the second included items broadly related to interpretation. For ease of presentation, we only include results for the replication and interpretation indices when they diverged from the overall pattern on the composite measure.
Level of discussion of issues in different courses
A one-way ANOVA was conducted to test the prediction that topics would be discussed more frequently in advanced courses (i.e., most in graduate courses, followed by advanced undergrad courses, and least in introductory undergrad courses), F(2, 570) = 14.23, p < .001, η2 = .05. Overall, topics were more likely to be discussed in advanced undergraduate (M = 1.82, SD = 0.58) and graduate (M = 2.01, SD = 0.64) than introductory undergraduate (M = 1.65, SD = 0.43) courses (Mdiff (intro vs. advanced) = −0.17, SE = .05, p = .001, 95% CI = [−0.27, −0.06]; Mdiff (intro vs. graduate) = −0.36, SE = .07, p < .001, 95% CI = [−0.54, −0.18]). The difference in level of discussion between advanced undergraduate and graduate courses did not meet the .01 threshold set to control for multiple comparisons (Mdiff (advanced vs. grad) = −0.19, SE = .08, p = .03, 95% CI = [−0.37, −0.01]).
Level of Discussion of Issues of Replication and Reform in Different Courses
Note. Participants were instructed to identify one course they taught in the past year at each level. Seven participants listed 2 + introductory undergraduate courses (instead of one), and five participants listed 2+ advanced undergraduate courses. Because (a) only a few participants listed >1 course, (b) those who listed >1 course tended to list similar types of courses (e.g., Social Psychology and Cognitive Psychology; Introductory Psychology and Developmental Psychology), (c) the overall pattern of results did not differ if these participants were included or not, and (d) no exclusion criteria were set in advance, all participants were retained in the main analyses. However, these participants were excluded from the analyses comparing teaching in different types of courses (i.e., for the analyses presented in this table) because they listed courses in more than one category.
In introductory courses, instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics (M = 2.13, SD = 0.49) than in Introductory/General Psychology (M = 1.55, SD = 0.37) or introductory content courses (M = 1.64, SD = 0.38), F(2, 272) = 28.97, p < .001, η2 = .18 (Mdiff (methods/stats vs. intro psych.) = 0.58, SE = .08, p < .001, 95% CI = [0.40, 0.76]; Mdiff (methods/stats vs. intro content) = 0.49, SE = .08, p < .001, 95% CI = [0.30, 0.67]). Similarly, in advanced undergraduate courses, instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics/lab courses (M = 2.21, SD = 0.65) than in content courses (M = 1.72, SD = 0.52), t(229) = 5.04, p < .001, d = 0.84 (Mdiff(methods/stats vs. content) = 0.48, SE = 0.10, 95% CI = [0.29, 0.67]). Overall, graduate instructors were more likely to discuss issues of replication and reform in Research Methods/Statistics/Writing courses (M = 2.43, SD = 0.65) than in other courses (M = 1.86, SD = 0.57), t(56) = 3.35, p = .001, d = 0.93 (Mdiff(methods/stats vs. other) = 0.73, SE = 0.22, 95% CI = [0.28, 1.18]). However, only issues of replication were discussed more extensively in Methods/Statistics/Writing (M = 2.56, SD = 0.69) than other courses (M = 1.69, SD = 0.61), t(56) = 4.81, p < .001, d = 1.34 (Mdiff(methods/stats vs. other) = 1.06, SE = 0.25, 95% CI = [0.56, 1.56]); issues of interpretation were discussed about equally in Methods/Statistics/Writing (M = 2.27, SD = 0.76) and other courses (M = 2.21, SD = 0.76), t(56) = 0.29, p = .77, d = 0.08.
Descriptive statistics regarding the composition of instructors in our sample are provided in the Supplementary Materials. We found no differences in teaching of the topics based on instructor and class characteristics (e.g., number of years teaching, academic rank, specialty, teaching load, teaching vs. research focus, and class size), p’s > .08 (see Supplementary Materials). The only exceptions were for exploratory analyses comparing social/personality psychologists to others (for graduate courses), and examining teaching based on number of student researchers supervised per year.
In graduate courses, social/personality psychologists (M = 2.12, SD = 0.75) tended to discuss issues of replication more extensively than non-social/personality psychologists (M = 1.64, SD = 0.64), t(55) = 2.44, p = .02, d = 0.69, Mdiff = 0.48, SE = 0.19, 95% CI = [.09, .88], whereas issues of interpretation were discussed about equally by social/personality (M = 2.21, SD = 0.75) and non-social/personality psychologists (M = 2.21, SD = 0.69), t(55) = −0.02, p = .99, d = .00, Mdiff = 0.00, SE = 0.20, 95% CI = [−.40, .40].
Number of RAs, r(255) = .17, p = .005, and undergraduate student researchers, r(254) = .36, p < .001, supervised per year correlated with teaching topics in introductory undergraduate courses; number of graduate student researchers did not, r(255) = −.08, p = .21. Number of undergraduate student researchers correlated with teaching topics in advanced undergraduate courses, r(225) = .28, p < .001; number of RAs, r(225) = .06, p = .33, and graduate student researchers, r(226) = .08, p = .23, did not. Number of undergraduate researchers correlated with teaching issues of replication and reform in graduate courses, r(55) = .33, p = .01, and number of graduate researchers marginally did, r(54) = .26, p = .052 (replication: r(54) = .19, p = .15; interpretation: r(54) = .32, p = .02); number of RAs did not, r(55) = .22, p = .11.
As with the teaching items, a principal components analysis with Varimax rotation was performed to examine whether different factors were present in the data. This analysis suggested that only one dominant factor was present (eigenvalue = 12.50, accounting for 40.31% of the variance), and thus we analyzed the perceived need for reform items as a composite measure (α = .95). See Table 2 for perceived need for reform item means and standard deviations and Figure 4 for percentages of responses to each question.
Percentage of participants who believe that no change, some small changes, moderate changes, or significant changes are needed in psychology research on each issue.
We found no differences in perceived need for reform based on instructor and class characteristics (e.g., number of years teaching, academic rank, specialty, teaching load, teaching vs. research focus, and class size), p’s ≥ .13 (see Supplementary Materials). Exploratory analyses revealed a small correlation between number of undergraduate student researchers supervised per year and perceived need for reform, r(322) = .13, p = .02, but no relationship between perceived need for reform and number of RAs, r(323) = .02, p = .70, or graduate students, r(323) = −.07, p = .18.
Perceived need for reform was not significantly related to teaching issues of replication and reform in introductory undergraduate courses, r(255) = .10, p = .11, but did correlate with teaching these issues in advanced undergraduate, r(228) = .30, p < .001, and graduate courses, r(56) = .44, p = .001 (replication: r(56) = .46, p < .001; interpretation: r(56) = .26, p = .052). 3
Discussion
This study presents novel measures of the teaching and perceptions of issues of replication and reform, and provides preliminary evidence of their internal consistency and validity. Across course levels, most participants reported discussing the issues of replication and reform raised on our survey briefly (or not at all) in their courses. Items on our measures—assessing a broad range of issues in research practice—were strongly intercorrelated, indicating that discussion of the various topics tended to co-occur. However, different factors emerged for graduate courses, suggesting that, in graduate courses, issues of replication may be discussed together and issues of interpretation may be discussed together. In undergraduate courses, both types of issues may be addressed by instructors seeking to teach critical scientific thinking skills, whereas graduate instructors may vary their focus on each type of issue based on the course content and goals. Indeed, graduate instructors were more likely to teach issues of replication in methods/statistics than content courses, whereas issues of interpretation were discussed to a similar extent in both types of courses. This difference in factor structure for undergraduate and graduate courses should be further explored in future research employing a larger sample of graduate instructors.
Overall, the topics were discussed in more depth in upper-level courses and in methods/statistics as opposed to content courses, demonstrating known-groups validity. Issues of replication were more likely to be discussed in graduate courses by social/personality psychologists than by non-social/personality psychologists, whereas issues of interpretation were discussed equally. Due to the small number of graduate instructors in our sample, this finding remains speculative. Even so, of all of the areas of psychology, social and personality psychology has likely received the most attention in discussions of the replication “crisis,” and thus social/personality psychologists may feel especially compelled to address issues of replication in their courses. Given that the present study oversampled social and personality psychologists, it is possible that issues of replication are discussed even less frequently across all psychology courses.
Although topics were more likely to be discussed in advanced (undergraduate and graduate) than introductory courses, a few items diverged from this trend. Notably, plagiarism was discussed more extensively in undergraduate than graduate courses, and generalizability and scientific uncertainty were discussed about equally in undergraduate and graduate courses. In fact, of all the items on the survey, generalizability was the most discussed topic across the three course levels. Alternative explanations for research findings, conflicting evidence in the literature, and the importance of replication were also commonly discussed, whereas political homogeneity, authorship decisions, the tone of discussions surrounding replication and reform, pre-registration, open data and materials, p-hacking, HARKing, and political bias were discussed least.
Across course levels, there were no differences in level of discussion of issues related to replication and reform based on number of years teaching, academic rank, teaching load, teaching vs. research focus, class size, or specialty. Given the relatively small sample collected (which oversampled social/personality psychologists and instructors in the United States), we refrain from drawing strong conclusions regarding whether these factors correlate with teaching issues of replication and reform in the larger population of academic psychologists. In particular, the small number of participants at each rank and in each specialty limited our ability to make comparisons on these variables.
Interestingly, however, one individual difference factor did correlate with teaching of issues of replication and reform: number of student researchers supervised per year. Number of undergraduate student researchers supervised per year correlated with teaching issues of replication and reform in introductory undergraduate, advanced undergraduate, and graduate courses, and number of graduate student researchers supervised per year marginally correlated with teaching these topics in graduate courses. These findings suggest that those involved in shaping next generation of psychology researchers may be most likely to teach about current issues and debates in psychology research.
Perceived Need for Reform in Psychology Research
In recent research, social/personality psychologists perceived the replicability of findings in their discipline to be low but slightly improving as a result of the replication and reform movement (Motyl et al., 2017). The results of our survey extend these findings, suggesting that psychologists believe moderate changes are still needed in psychology research on many issues. Indeed, our perceived need for reform items were strongly intercorrelated, suggesting that, currently, psychologists hold similar perceptions of the need for further reform on a range of research practices. Participants reported the most significant changes are needed in reducing the pressure to publish in academia and in restructuring the incentive systems in academia to promote rigorous and transparent research.
Providing preliminary evidence of the convergent and discriminant validity of our measures, perceived need for reform moderately correlated with teaching issues of replication and reform in advanced undergraduate and graduate (but not introductory) undergraduate courses. These results suggest that those who believe more reform is needed in psychology research are more likely to advocate for changes by teaching about these issues in upper-level courses, but other factors also influence psychologists’ decisions about whether to teach these issues. Indeed, course content may be heavily dependent on the amount of material that needs to get covered in each course, major requirements and departmental decisions about course curricula, and attempts to standardize courses across instructors.
Limitations, Implications, and Future Directions
The present study offers novel measures of the teaching and perceptions of issues of replication and reform to be further refined and applied in future research (e.g., in studies containing more representative samples, tracking teaching and perceived need for reform over time and across disciplines, comparing researcher attitudes to student and public opinion, examining the impact/consequences of teaching and perceptions of these issues, etc.). Although this study provides preliminary evidence of the internal consistency and validity of these measures, these questionnaires should be further validated and possibly reduced in length (due to their high reliability; DeVellis, 2003) before they are considered established instruments or used on a large scale.
Furthermore, the current findings may not generalize across the larger population of academic psychologists. Although the present sample was similar in size to other surveys of academic psychologists advertised through society listservs (e.g., Inbar & Lammers, 2012), this study might have contained a selection bias in those who were willing to participate (e.g., those in more teaching-focused positions, as the recruitment email described the study as a survey on the teaching of undergraduate and graduate psychology). Certainly, because we only had access to certain listservs, the present study did not representatively sample all psychologists.
Many participants in our sample primarily teach undergraduates, and including a discussion of these topics in every course would not be feasible or desired. It is unclear from the present findings whether most psychology majors get exposure to these issues at some point in their psychology education, as participants only reported about one course they had taught in the past year at each course level. The present findings indicate that instructors are more likely to incorporate a discussion of these issues in more advanced, research-focused courses, perhaps to allow students to first develop a foundation in psychology (and given their relevance to research design and practice). In addition, graduate students may discuss these issues quite extensively with their peers, professors, and advisers in contexts other than the classroom (e.g., in informal discussions, in collaborating on research projects, or at talks, seminars, and conferences). Nonetheless, it may be useful for future research to examine the impact of exposing students to issues of replicability even if they do not pursue advanced research training (e.g., helping them become critical consumers of science in their daily lives).
Finally, our research can serve as further establishment of a baseline for how the field-wide challenges are being addressed in the classroom. Over time, we anticipate this research continuing, documenting how psychology instructors teaching practices change (and do not change) over time. We also believe that these materials could be adapted to track how other fields experiencing a crisis of confidence adapt their teaching techniques (such as cancer research).
Conclusion
Psychology has been at the centre of discussions of problems in research practice and at the forefront of developing solutions. To date, however, discussions have largely focused on reforming research practice (but see Chopik et al., 2018; Frank & Saxe, 2012; Funder et al., 2014), and perceptions of how much reform is still needed on a range of research practices have been largely unknown. The results of our survey suggest that psychology professors (a) may only briefly discuss issues of replication and reform in their courses but give more attention to these issues in methods/statistics and upper-level courses, (b) are more likely to discuss these issues if they believe more reform is needed in psychology research and supervise student researchers, and (c) overall still believe more changes are needed in many psychology research practices. Additional research is needed to further understand knowledge and perceptions of problems in scientific research (e.g., across disciplines and over time) and the impact of teaching these issues in the classroom.
Supplemental Material
Supplemental material for Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication, Interpretation, and Transparency
Supplemental Material for Perceived Need for Reform in Field-Wide Methods and the Teaching of Replication, Interpretation, and Transparency by Stephanie M. Anglin and John E. Edlund in Psychology Learning & Teaching
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
