Abstract
Concerns about the credibility of social science have increased calls for more transparency, but many academic incentives undermine individual efforts. Institutional changes are needed to substantially transform conventional practices. The authors present a series of specific actions that can be taken by different actors in social science knowledge production: journals, reviewers, professional organizations, teachers, universities and departments, funding sources, and data providers.
Social science faces a pivotal period regarding its prevailing research practices. What has become clear is that the discretion of usual research practices conceals opportunities for mistakes and bias that result in findings being far more fragile than previously understood. Part of the solution is promoting transparency: the credibility of findings and the contributions of research are enhanced by making as much information available for inspection by others as possible.
Transparency poses a collective action problem. Even when the benefits of increasing transparency to scientific communities are readily recognized, individual researchers have pervasive incentives not to participate and may even perceive that providing more information about their projects poses risks. As such, exhorting researchers to be more transparent only goes so far. Instead, what are needed are institutional solutions. Here we present concrete actions that different institutional actors involved in social science knowledge production can do to promote transparency in research practice.
Journals
Journals can promote transparency most unequivocally by mandating transparent practices as a condition of publication. Journals of the American Economic Association (2016), for example, follow this guideline:
Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication. These will be posted on the AEA website.
Many journals have long stipulated that authors are obliged to provide materials if requested after an article is published. Stipulations on their own have been shown repeatedly to yield poor compliance (e.g., Dewald, Thursby, and Anderson 1986; Kim and Adler 2015; Wicherts 2006). Once an article is published, researchers have little incentive to respond to requests for materials, especially because they could be used to cast doubt upon one’s original findings, and researchers’ ability to provide them also erodes as memory of how projects were organized recedes. Mandating that materials be provided as part of publication ensures availability and encourages more assiduous organization and documentation of materials. Several high-quality, independent repositories for depositing materials exist, among them openICPSR, Dataverse, and the Open Science Framework. Using independent repositories, instead of personal Web sites, is the best practice for ensuring long-term availability and integrity of materials.
Some journals have taken the additional step of having an independent party actually affirm that data and code provided by authors can reproduce the findings presented in an article. (We are talking here about simply making sure that results can be reproduced, not whether the analyses contain mistakes or other problems.) The American Journal of Political Science began collaborating with the Odum Institute for this purpose in 2015.
Just knowing that this verification process is part of publication strongly encourages authors to clearly organize and document their analyses. Of course, this adds cost to the production of journal articles, although that cost might decline sharply as authors become better trained in documenting data and generating reproducible code. Despite the example of the American Journal of Political Science, we have heard the idea of verifying results dismissed as infeasible given the supposed razor-thin economics of journal publishing. The brutal fact is that many “flagship” journals in the social sciences are profit centers for professional associations, and what priority those organizations assign to expenses that would increase the quality of published articles is a question of political will.
Not all social science data can be shared, because of their proprietary or confidential nature, or other reasons. Journal policies can allow authors to request exemptions. About a third of articles published by the American Economic Review receive exemptions to their data-sharing requirements, which we know because the journal publishes the figure along with its other annual reporting statistics (e.g., Goldberg 2015). Although researchers can reasonably disagree about what authors ought to be expected to share, what cannot be defended is ambiguity about whether and what materials are available. The easiest way to remove ambiguity is to institutionalize conventions about how disclosure will be done (Freese 2007). An appealing policy originating in political science is that the first footnote of a paper will provide information about data availability.
If a journal is not prepared to mandate transparent practices, it could implement badges as an intermediate solution. The Open Science Framework badges (see https://osf.io/tvyxz/) acknowledge open practices, including open data and open materials (e.g., stimuli from experiments). For example, Psychological Science displays these badges at the top of an article and in the online table of contents. Badges may sound hokey, but indications so far from psychology are that they do indeed serve as a nudge that increases the prevalence of desired practices.
Along a similar vein, statements identifying the specific contributions of listed authors would increase transparency in credit for the various activities of research. For example, the Public Library of Science family of journals include a section on author contributions (including conceptualization, data curation, funding, methodology, writing, etc.) at the end of every article. Consistently identifying contributors’ roles can also help reveal patterns of subtle inequality in the academy (e.g., Macaluso et al. 2016).
Journals can also ask authors to complete checklists of reporting guidelines about data and analyses that authors are expected to disclose. Checklists can help clarify expectations about what information about data sources the editors want to be reported, like asking authors to report information about the organization that fielded a study or a standardized measure of response rates. But checklists can also be used make explicit journal expectations about reported analyses. For example, Psychological Science has instituted a checklist on which authors are asked to indicate whether and why any observations are excluded from analysis and whether additional variables were analyzed than those reported. Transparency in reporting may help reduce false positives resulting from uncorrected multiple comparisons (“p-hacking”) and selective publication bias (the “file drawer problem”). Even when there might not be anything stopping authors from lying, the reporting guidelines checklist eliminates the possibility of information being omitted because authors and editors had different understandings about prevailing reporting norms.
Journal editors have also expressed worry that raising standards for what authors are expected to do and disclose will simply lead authors to submit their articles somewhere else. This concern becomes more compelling as one moves down the prestige hierarchy of journals and more closely substitutable options start to proliferate. For this reason, top disciplinary and field journals are best positioned to take the lead in improving standards. Given that many articles published in other journals often start as submissions to high-profile outlets, increased expectations for top journals would likely have positive spillover effects in their influence on the practices of articles that ultimately appear elsewhere. There have also been efforts to get groups of journals to sign on to guidelines to be implemented together, such as the Transparency and Openness Guidelines, spearheaded by the Center for Open Science (Nosek et al. 2015) or the Data Access and Research Transparency guidelines proposed by political scientists (Lupia and Elman 2014).
Increasing access to primary materials from articles they publish is the most obvious way journals can promote transparent practice, but it is not the only way. Online supplements provide opportunities for authors to provide details and additional analyses, and journals can encourage their use and systematic preservation. That journals continue to allow authors to indicate that some results are “available upon request” is difficult to understand in an online age.
Open-science advocates have also regularly called for journals to publish more replication studies, which may be helped by having concrete policies that announce a journal’s openness to such work. Although replication and transparency are separate issues, transparent practices make it easier to conduct good replication studies by increasing the ability of subsequent researchers to follow what was done in the original study. The idea of “preaccepting” replication studies on the basis of their design—before data have been collected and results are known—also encourages transparency by reducing incentives to present only results that fit a particular interpretation. Journals have some obligation to provide a path to publication for replication work involving studies that the journals previously published. Key to faith in science is the idea that it is self-correcting. To justify that faith, mechanisms need to encourage replication and ensure the complete record is open to credibility assessment (Freese 2007).
Beyond this, social science journals can also encourage transparency by publishing types of materials different than the classic research article. As one idea, journals could allow short publications on noteworthy data sets that are newly available, helping transform data into the currency of publication (Lin and Strasser 2014).
Reviewers
Reviewers are asked to provide the expert assessments of research quality. If they regard transparent practices as pertinent to the quality of a manuscript they are reviewing, they can voice that opinion, even if it is ultimately the editor’s discretion whether to press the matter. More systematically, in experimental psychology, the Center for Open Science endorsed a standard statement that interested reviewers could include in all of their reviews, including the text:
I request that the authors add a statement to the paper confirming whether, for all experiments, they have reported all measures, conditions, data exclusions, and how they determined their sample sizes. The authors should, of course, add any additional text to ensure the statement is accurate. (Nosek et al. 2013)
Of course, pasting the same sentences into every review is obviously a way of agitating for a general change in a journal’s policy. The Peer Reviewers’ Openness initiative, also mostly centered in psychology, is an even more dramatic gesture to this end (Morey et al. 2015). Signatories declared that after January 1, 2017, they would no longer comprehensively review articles until they either meet specific minimum standards for open data and materials or explain why not. Requiring awareness of transparency considerations within the review process is a more genuinely “grassroots” form of scientific activism than perhaps anything else we discuss.
Reviewers can also ask journals for the same checklists of reporting guidelines we discuss above. In this way, they can create demand for such guidelines to exist. By following a set of criteria for what results should be reported in a published article, reviewers can create consistency in published reports and reduce the likelihood of the statistical problems discussed above.
Professional Organizations
Disciplinary and other academic professional organizations oversee policies at many journals and so have a key role to play in adopting the practices just described. Examples are the earlier cited policies for data availability shared across all the journals of the American Economic Association. Professional organizations can also play a central role in developing norms with understanding and buy-in from different research constituencies. A recurring concern raised about open-science guidelines, especially when mandates are involved, is that they do not recognize the special exigencies posed by different types of research. Professional organizations have expertise in bringing together and coordinating shared actions among members. We see five particular concrete contributions that professional organizations can make.
First, professional organizations can clarify ethical expectations for researchers about materials sharing and replication. Although we are skeptical of the capacity for ethical goading to change behavior without broader changes to incentives, ethical guidelines clarify organizational positions about what researchers ought to do and provide reference for teaching, setting policy, or resolving disputes. Before political science’s Data Access and Research Transparency guidelines became the force for shaping journal policies that they are today, they were included in the American Political Science Association’s guide to professional ethics (Lupia and Elman 2014).
Second, professional organizations can establish working groups to create reporting guidelines for journal articles published in their fields. Working groups allow researchers who share an area or method to develop standards that best fit its particularities, as opposed to researchers feeling like standards developed with a different type of research in mind are being imposed on them. For example, in political science, the Experimental Research section of the American Political Science Association convened a subcommittee that offered standards for reporting of different types of experiments (Gerber et al. 2014). Such reporting standards make it clear to both authors and reviewers what needs to be included in an article. As discussed earlier, not only does this make criteria for acceptable research more transparent, it also increases quality and reproducibility by ensuring that vital details are included in the manuscript.
Third, professional organizations can provide value to their members by helping establish norms for data storage and citation. Researchers report that their willingness to share data is driven by perceived effort, career incentives, and normative pressure (Kim and Adler 2015; Tenopir et al. 2011). Accepted standards for data formatting, disciplinary conventions about where data should be deposited, and even sample documentation illustrating code for cleaning and preparing data can help encourage data sharing. Established platforms for data archiving, such as ICPSR, Harvard’s Dataverse, and the Center for Open Science could be institutional partners in such a project.
Fourth, professional organizations should also establish robust data citation standards. The same sort of exacting standardization that style guides provide for referencing books and articles by others should also be applied to data sources. The data citation format recommended by the American Sociological Review provides a good example (see also Digital Preservation Alliance for the Social Sciences 2016). Data citation encourages data sharing by providing a mechanism for people to receive intellectual credit for doing so.
Finally, professional organizations can help create and support the infrastructure to encourage other transparency practices. For example, they can create awards to recognize the professional contribution to the research community by those who create or maintain data sets. In the American Sociological Association, such awards could be analogous to section or association-wide paper awards. Another contribution would be to support the new online social science archives databases, developed in collaboration with and hosted by the Center for Open Science. These free, online preprint repositories—the National Bureau of Economic Research and the just developed SocArXiv and PsyArXiv for sociology and psychology, among others—allow researchers to post their papers before they have been published and even before submission to a journal in an easily searchable, central location, increasing the speed of dissemination of social science research.
In addition, outreach by professional organizations can help convey standards and impart the skills needed to implement them. Training is the focus of the next section, but here we wish to underscore how professional organizations can offer workshops and webinars to help more researchers have access to information about best practices to use in conducting their own research and teaching others.
Teachers and Mentors
Institutionalizing transparency involves not only values and policies but also skills. Indeed, one significant barrier to individual researchers sharing their code may be code shyness: even researchers who have been assiduous in documenting their code and double-checking results may believe their code is a mess. And they may well be right, given that many social scientists receive little guidance on how to write code well. Institutionalizing transparency will require a significant shift in how social scientists are trained, so that researchers are confident in sharing their materials and those materials are organized in a way that maximizes the extent to which others can learn from them.
We can briefly highlight several key topics we think need to be part of routine training. First, good coding practices include topics such as using abstraction and automation. Graduate students should know how to automate programs for operations they plan to perform many times, rather than repeating tasks in ways known to be error prone. Code should be self-documenting as much as possible, so it is easy to maintain and internally consistent. Second, strong data and file management skills leave files in a readily shareable format that is not just easier for others to follow but also much easier for researchers themselves to refresh their understandings of projects that gestate over long periods. Third, version control platforms, such as Git and GitHub, are increasingly indispensable in collaborative environments and provide a robust alternative to the proliferating files and desperately intricate file-naming schemes that besiege many complicated projects.
Graduate school provides the most structured context for training. There is no substitute for learning while doing, putting these principles to use. Replicating an existing paper may be a valuable apprentice exercise that can be used to help develop basic principles of good practice. One benefit of journal articles making code available with publication is that it grows the base of examples from which new researchers can learn (though these, like any examples of research, will need to be used with a discerning eye toward quality).
For teachers and mentors involved in training the next generation of graduate students, imparting the skills of transparent research practice may involve learning these skills themselves. Of course, the need to keep learning techniques is nothing new for researchers on the cutting edge. Better still is that the resources available to learn contemporary practices are now abundant, including many wise papers about how to integrate better practices into social science work flow (e.g., Christensen 2016; Gentzkow and Shapiro 2014; Healy 2016; Long 2009; Wilson et al. 2014). In addition, Web sites such as Software Carpentry (https://software-carpentry.org/lessons/), the Berkeley Initiative for Transparency in the Social Sciences (http://www.bitss.org/resource-tag/education/), and the Center for Open Science (https://cos.io/stats_consulting/) also provide great introductory videos and resources on many topics.
Universities and Departments
Universities often present researchers with incentives that flatly contradict ideal practices for producing robust research. For example, many social science departments endorse a humanities-style model that fetishizes sole-authored research, often on the grounds that individual scholarly contributions can be judged only when researchers work alone. Working in isolation invites mistakes; isolated work with details that need never be made available to others invites corruption and self-deception. Social sciences should encourage collaborative work, not penalize it. (Even as we recommend this, however, we urge departments to be reflective about potential biases that disproportionately reward some practitioners of collaborative research more than others. Evidence from economics, for example, suggests that coauthoring by men is credited more than by women [Sarsons 2015]. Women are also less likely to be in the prestige positions of sole-author, first, and last author [West et al. 2013].)
Universities also often place a particular premium on the quantity of research produced. Beyond the predictable negative implications for quality, an emphasis on quantity provides even less incentive for researchers to do the extra work involved in carefully documenting their research and making materials available to others. Universities should reward transparent behaviors by recognizing efforts to share code and researcher-generated data as part of the hiring, tenure, and promotion process. If the trade-off researchers face is between writing reproducible code or documenting and sharing their data with the wider research community (activities that take a substantial time commitment if done well) and writing another paper, it is no wonder that under the current system of rewards, data and code sharing fall by the wayside.
Universities also often value headline-grabbing results in ways that encourage researchers to slant their conclusions to more provocative conclusions. Indeed, although one regularly hears researchers bemoan how “the media” distort and sensationalizes new studies, evidence indicates that most of the distortion is introduced not by science journalists but instead by the press releases issued by universities and professional organizations (Sumner et al. 2014).
University institutional review boards (IRBs) need to recognize openness as an ethical principle to be weighed among other priorities in research protocols. Obviously, participant protections are important, but confidentiality can also provide a rationalization for researchers to avoid accountability. Consent forms regularly—and sometimes at the IRB’s behest—preclude sharing data with others, and IRBs have even pressed researchers to agree to destroy data once projects have been completed. The scientific virtues of openness clash with a legalistic mind-set against disclosure that pervades many universities.
Most of this section has been about reducing the extent to which universities hinder transparencies, but ideally they would go further and actively promote it. Departments can begin encouraging these practices from an early career stage by establishing transparency norms among their graduate students. We discussed specific ideas in the section on “Teachers and Mentors,” but implementation would be better as integrated over the whole of students’ methods training. For that matter, transparent practices—such as sharing data and code and submitting preanalysis plans—could be required as a part of graduation or thesis requirements, with the same possibilities for exemption as journal articles.
Another key element departments and universities can contribute is administrative support. Two of the main barriers that researchers report in sharing their data and code are the time and money involved in preparing these materials for public release (Tenopir et al. 2011). Universities and departments can help overcome these significant concerns by developing staff trained to help faculty members and other researchers prepare data documentation and code for open sharing. Such an undertaking could require significant institutional resources. However, in helping researchers make their work more transparent, universities would be furthering their mission in several ways: (1) by providing additional research materials that can be used for further studies, research universities will increase the value their faculty members add to research communities; (2) by helping make their faculty members’ research reproducible and transparent, universities further their mission in support of increasing public knowledge; and (3) by helping ensure high-quality, public data releases, universities increase their own visibility to potential students, funders, and other interested parties.
Funding and Data Sources
Funding sources have already played an important role in increasing data transparency. Data-sharing plans are required for large grants from many agencies. The National Institutes of Health (2015) has recently made “rigor and reproducibility” an explicit part of its review process. We hope these expectations will be expanded more broadly. The Transparency and Openness Guidelines mentioned earlier offer suggested policy language for funding sources as well as for journals (Nosek et al. 2015).
Of course, many large data collection projects in social science are already funded chiefly for the secondary analyses that others will do with them. Often these data sources will ask users to register in order to download the data, including agreeing to conditions of use such as properly citing the data and providing bibliographic information for publications using the data. To these could be added expectations about sharing code as a condition of using a given data set. When an author makes code available for the data set used to publish an article, it augments the resource base of that data set.
Data providers may also be able to contribute to transparency by making codebooks available in advance of data releases when possible. There has been much recent enthusiasm for “preregistration” in experimental research, in which researchers specify plans in advance of collecting data, to demonstrate that hypotheses presented as a priori were not developed to fit the data post hoc (e.g., Nuzzo 2015). For secondary data analysis, researchers usually have no way of similarly demonstrating that they formulated the hypotheses before looking at the data. Advance availability of codebooks would provide the possibility of documenting hypotheses before the data were available for analyses.
Conclusion
Transparency is the cornerstone for many of the other reforms and innovations identified as integral to restoring the credibility of social science. Sharing data and code helps future replications. Full reports of completed analyses may help with crises of statistical power and bolster the credibility of meta-analyses of the published literature. Prepublishing codebooks could help with preregistration of analysis plans.
Although individuals can adopt practices on our own, we believe changing prevailing practices will require institutional actions. Of course, these require action of individuals within the respective institutions to enact them. Individuals in such positions must act to create the changes necessary to increase transparency. The rest of us can do our part by making them aware of the value and interest in doing so.
Footnotes
Acknowledgements
We thank Garret Christensen, Ted Miguel, and David Peterson for helpful conversations.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by a National Science Foundation Graduate Research Fellowship (DGE-1147470) to Molly M. King.
Author Biographies
).
