Abstract
This paper is based on a presentation delivered as part of the NISO Plus 2022 panel discussion titled “Open Science: catch phrase, or a better way of doing research?” that focused on the workflows of Open Science and opportunities for collaboration by stakeholders including publishers, repository infrastructure providers, and the wider research community.
While the aims and outputs of Open Science are well-defined, this paper explores the workflows that are necessary to support the production of “open scientific knowledge”, as defined by UNESCO. Producing research outputs as open scientific knowledge is an activity that is undertaken alongside traditional research practices and must be planned for from the beginning of the research process.
This paper explores the challenges and opportunities associated with Open Science workflows, focusing on an innovative new automated publishing pipeline on the
Introduction
The publication of the UNESCO Recommendation on Open Science inspired a panel discussion at the NISO Plus 2022 conference, “Open Science: catchphrase, or a better way of doing research?” During the panel the speakers, who included representatives from the American Geophysical Union, the Shanghai Information Center for Life Science (CAS), the repository Dryad, and academic publisher F1000, considered not only the beneficial outputs of Open Science practices, but also the methodologies and workflows involved in achieving Open Science.
As an academic publisher, F1000 provides research publishing solutions and services to organisations including the European Commission, Wellcome, and the Bill & Melinda Gates Foundation, as well as directly to researchers through the F1000 Research publishing platform. F1000 publishing platforms are fully Open Access, support open peer review and the open publication of new versions of articles, and have strong Open Data policies that require that authors share all of the research data underlying their articles.
This paper considers the challenges associated with Open Science workflows and the publication of open research outputs and describes a project on the
The challenges of Open Science
The UNESCO Recommendation on Open Science [1] states that Open Science “[…] aims to make multilingual scientific knowledge openly-available, accessible, and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation, and communication to societal actors beyond the traditional scientific community”. The knowledge produced by Open Science practices may take the form of Open Access publications, research data, metadata, educational resources, software, source code, and hardware; it also refers to the possibility of opening research methodologies and evaluation processes.
The production of open scientific knowledge requires investment of time and resources on the part of the researcher, as well as an awareness and understanding of the activities involved. Additional motivators or credit may also be necessary to encourage the adoption of such practices by researchers who have not previously worked in “Open” ways. Hagger notes the move towards Open Science practice requires a complete shift in mindset on the part of the researcher, towards an assumption that every part of a research project will be subject to examination by others [2].
Not only is it necessary for researchers to develop an Open Science mindset, but it is important that they do so early in their research lifecycle if the intention is to share their research outputs openly. A study by Gownaris et al. identified that researchers’ awareness of the practices associated with Open Science is particularly low in relation to early phases of research (e.g., the study design stage). This lack of awareness can cause “path dependencies” that prevent the open sharing of outputs in later stages of the research [3]. As an example, such path dependencies can impact on the ways that researchers share additional outputs openly when publishing a research article. Many academic publishers, including F1000, enforce strict Open Data sharing policies for their authors [4]. If researchers who have conducted their research with human research participants do not consider Open Science practices at the planning stage of their research, they may fail to gain participants’ permission to share their research data openly. This can then cause issues when the researcher attempts to publish their research article, as they cannot comply with the journal’s Open Data policy.
Stakeholders’ Open Science policies are increasing however, emerging from funding agencies [5,6] institutions [7], and academic publishers [8]. These policies tend to focus on two elements of Open Science: Open Access publishing and Open Data sharing. There is evidence that such policy mandates are effective in changing researcher behaviour, for example BioMed Central and the Public Library of Science (PLoS) journals demonstrated increased data sharing when strong research data policies were introduced in 2015 and 2014 respectively [9]. While mandates increase, the motivators or rewards for researchers who practice Open Science are not always clear. The ON-MERRIT project has identified discrepancies between researchers’ attitudes towards Open Science practices, and the extent to which they are rewarded through institutions’ policies on promotion, review and tenure. For example, 70.4% of surveyed researchers believed that sharing research data openly should be considered as an important or very important factor in promotion decisions, but only 26.6% of institutional policies designate it to be so [10]. In parallel, 65% of surveyed researchers have stated that they have never received credit (for example in the form of a citation) because they had shared their data openly [11]. Initiatives such as the UKRN (UK Reproducibility Network)’s five-year programme of work with its consortium of institutional members [12], and the Montreal Neurological Institute (MNI)’s announcement of their full commitment to Open Science demonstrate the ways that institutions can address the new challenges that Open Science presents [13].
Nevertheless, many challenges associated with Open Science remain: to produce open outputs like articles, research data, software, hardware, or education materials, it is necessary to plan for “openness” from the beginning of the research project. To consider a new “open” way of working, a change of mindset may be needed, as researchers have not traditionally shared all project outputs in this way. While funder and publisher policies are increasingly mandating some facets of Open Science, the tangible benefits for researchers are not yet clear.
Automating publishing workflows: The Tree of Life project at Wellcome Open Research
While stakeholders continue to balance mandates and motivators, efforts are underway to improve the workflows of Open Science and to make processes more frictionless. For example, academic journals with Open Data policies often require authors to deposit their research data openly before submitting a manuscript. Authors may not be aware of this requirement until they are midway through their submission to a journal and will then need to discontinue their submission while they deposit their research data appropriately elsewhere. A 2022 pilot on
An innovative project on the publishing platform
In this new publishing workflow, the Sanger Institute sequences each genome, and the sequence is deposited into the ENA (European Nucleotide Archive). Using information from the sequencing equipment and contextual information written by Sanger-affiliated researchers, an XML file for the Genome Note is then compiled. During the sequencing process, the Sanger Institute also collates information and metrics about the quality of each genome assembly. The metrics include Base pair QV, Scaffold N50/NG50, and BUSCO completeness, which are added to the Genome Note XML, and published alongside the article as an automated benchmarking report to support peer review. Some metrics are also represented as figures which are made available in the body of the article. A package containing the XML file (including the Genome Note article text, and the automated benchmarking report) plus the figures is sent directly to the
As the Sanger Institute intends to publish up to seventy thousand Genome Notes as part of the Tree of Life project, this workflow allows publication volumes which would not be possible in a traditional publishing workflow. As the automated benchmarking reduces the manual effort that would be required to generate and review these sequences via the traditional publishing process, lower Article Processing Charges (APCs) can also be charged for each publication.
This automated publishing workflow is a unique across publishers and could represent a new publishing approach where information flows directly from lab equipment to the publishing platform or journal. The automated benchmarking report can also be used as a support mechanism to reduce the burden on peer reviewers, as peer reviewing can be time consuming and labour intensive [20]. As an additional benefit, the sequenced data is shared both in an appropriate data repository and through the published Genome Note which is indexed and citable, allowing the author to gain maximum credit for their outputs. There is further potential to apply elements of this workflow to other data types which are published at scale or require rapid dissemination, for example health data gathered during the Covid-19 pandemic.
Conclusion
Open Science practices can represent a challenge to researchers, as they may represent new ways of working with unclear reward structures. As stakeholders in Open Science, it is beneficial for publishers to consider accompanying Open Science policy mandates with new approaches to facilitate easier publication of open research outputs. Technical solutions such as integrated data deposition or automated publishing pipelines can help to reduce friction in Open Science publishing, reducing the burden on the researcher. As the workflows of Open Science become easier and more standardised, they will balance with policy mandates, rewards, and incentives to create a way of doing research that is more accessible, more straightforward and more beneficial than traditional, closed methods.
Footnotes
Acknowledgements
The author would like to thank her co-panellists at NISO, Shelley Stall (AGU), Jennifer Gibson (Dryad) and Yongjuan Zhang (CAS); and her colleagues at F1000 and the Sanger Institute who have developed the automated publishing workflow for the Tree of Life gateway on
