Abstract
This paper summarizes the 17th Academic Publishing in Europe (APE) Conference: The Future of the Permanent Record, held online from 11 until 13 January, 2022, and organized by the Berlin Institute of Scholarly Publishing (BISP), a not-for-profit organization dedicated to bringing publishers, researchers, funders and policymakers together. The conference consisted of keynote speeches, presentations, and panel discussions on a variety of scholarly communication topics. Main themes were the value of the Version of Record and how to preserve quality and trust in the scholarly record. Collaboration across all stakeholders and high-quality metadata are key, especially when looking at the future where a “record of versions” could exist, connecting all research outputs. Other conference topics involved the threat of paper mills, research integrity and the importance of digital preservation. To ensure trustworthy and high-quality publications, knowledge exchange and cooperation are crucial. Effective partnerships and a multi-perspective approach are key in fostering inclusion, diversity and equity in scholarly communication too. Several speakers stressed the important role publishers play in moving towards an inclusive and trusted Open Science ecosystem, in which data sharing is accelerated and quality assurance and control are implemented across the entire research cycle. To achieve this, new ways of working and new business models need to be developed, and entrepreneurship and innovation encouraged. The 17th APE ended with a session in which five startups showcased their innovative products to advance science.
Keywords
DAY ONE: Tuesday, 11 January 2022
The
Dr. Sutton added that the changing nature of research, with new methods, technologies, and Open Science (OS) practices, has consequences for the VoR. The linear publishing workflow is being broken up and undergoing a structural shift. According to Dr. Sutton, this raises several questions: Is the VoR a drag on the system? What does versioning mean for the permanent record? Do these early sharing outputs require the same level of standards? Can we bring all outputs together? Is that the new VoR? Who will be held accountable?
Dr. Sutton said it is important to recognize which actors are responsible for validating the information. Trust can be eroded and there is a growing number of stakeholders who question publishers’ motives. When moving away from the printed version to multiple digital versions of the VoR, a universal way to stamp items, to place trust and assign accountability and responsibility, should be discovered. Dr. Sutton concluded that there is no simple way to do this and input from different stakeholders and from multiple perspectives is needed.
The two keynotes were introduced by
In the first keynote:
She continued by saying that partly as a consequence of funder mandates, publishing is gradually shifting to an Open Access (OA) system, mostly financed by Article Processing Charges. There are other approaches, e.g. transformative agreements. Alternative approaches include funder supported journals and publishing platforms (e.g. eLife), models supported by consortia of libraries and/or funders (e.g. PLOS Community Action Publishing) and Subscribe to Open initiatives.
Prof. Leptin concluded her speech with some thoughts on what (else) funders could do to prevent inequalities: ensure that research grants come with necessary OA funding for high-quality journals and books, develop mechanisms to support OA after the end of the grant, provide adequate training, focus assessment criteria on the intrinsic quality of the proposed project, and consider supporting societies/institutions that publish their own journals/books.
In the second keynote:
He added that metadata quality needs to be improved: “It’s the product’s front door”. High quality metadata is key to discovery and online interoperability. There are changes in the ecosystem that support this transition, but funding for these infrastructures is a problem, funding for maintenance is an even bigger problem.
Carpenter showed how formats have changed over the past 30 years, from print-focused in 1992 to primarily, almost exclusively, digital distribution in 2022. From just two identifiers, ISBN and ISSN, to multiple identifiers and a focus on data management. Carpenter said that the “article of the future” is so much more than a single object, it is a network of inter-related content forms. In order to support the new “Record of Science”, the ecosystem needs to evolve; who is responsible for the scholarly record of the future and necessary metadata creation? Researchers? Librarians? Publishers? How do we build a notification system to connect the disparate pieces?
Carpenter said that technology changes far faster than the cultural changes that are required to drive widespread adoption of a new ecosystem. People will only change the systems they use if they are motivated to do so, either by better benefit to them, or because they are forced to. When people get meaningful recognition for data sharing, they will. He also stressed that although technology drives change, it does not determine the direction or eventual destination.
In his talk:
He introduced several Crossref services related to research integrity. One of them is Crossmark [2]: it shows the current status of content; if it has been updated, corrected, or retracted. Another service is Event Data [3]: this functionality creates links between related outputs, to provide context and commentary to research, e.g., citations on Wikipedia, mentions on social media, blogposts etcetera. Rittman added that since its implementation, the number of events has grown but there is work to do, e.g., in updating the infrastructure.
Currently, Crossref is prototyping a relationships API: you can put in a query for an item (e.g., a DOI) and get back the relationships to other items. Through the relationships API, it is possible to show the research nexus: all the outputs that come within the scholarly sphere and the relationships that exist between them. Rittman concluded by saying that Crossref can’t solve all problems related to research integrity. Metadata plays a keyrole and complete metadata from members [4] is needed, as well as feedback on roadmap items [5].
The session:
Prof. Behl said that the paper mill papers contain no real data but fake and fabricated data, and need to be recognized upon submission or retracted if they have already been published. The unusual increase in submissions due to paper mill attacks has created awareness in the community, and initiatives like Retraction Watch [6] have been established.
Prof. Behl added that although costly, measures of quality control need to be consequently executed. In November 2021, the Journal of Cellular Biochemistry published a supplemental issue with retractions only and has implemented several additional measures, such as a new editorial board, new author guidelines, and new workflows, to maintain scientific quality.
Prof. Behl showed some examples on how the new workflow enables automatic checks on plagiarism and image integrity. He ended his presentation by saying that scientists have to exemplify scientific integrity and teach working conscientiously with data. He added that cooperation and knowledge exchange on novel movements of scientific fraud are key to bring back, protect and preserve trust in science.
In his speech:
Prof. Sabel said Restorative Neurology and Neuroscience (RNN) carried out an analysis to find out if RNN was a victim of paper mills. They identified 23 (15.3%) “probable” Fake Papers (FPs). Next, they screened a random mix of 3500 publications from 35 basic and clinical neuroscience journals and identified 376 “suspects” (10.7%). When extrapolating this number to PubMed publications, they estimated 450.000 medical FPs and 1.4 million FPs for all 14 million science/technology publications.
Prof. Sabel said that all stakeholders should be concerned when looking at this scope. Everyone must act, even if these actions will lead to (temporary) revenue loss. Employers should not put pressure on scientists. Publishers should support the journal editors to filter out FPs, provide reviewers with incentives, help develop tools to eradicate FPs from the permanent scientific record. Prof. Sabel stressed that publishers are responsible to help clean up the mess. The FPs impact science and society, create risks, and damage trust of the public in science. He concluded: It’s time to address the problem.
Jasper Journals Preserved [8] is a project to preserve Open Access journals. It was launched in 2020 and spans all disciplines and all parts of the world. Dr. Wise added that disappearing journals are not exclusively an OA issue.
She said that with book preservation, the challenge is even greater. There are no standards for tracking: e.g. How many ISBNs are there? How many of those for eBooks? How many of those are preserved? These are questions that need to be answered. Dr. Wise announced that Springer Nature expanded the partnership with CLOCKSS, all SN titles published since 1815 have been added to the archive.
Dr. Wise concluded her presentation with the challenges around the core infrastructure that need to be solved: discovery tools, unique identifiers and knowledge graphs are needed to ensure the long-term availability of the record of scholarship. She added that publishers have a crucial role to play.
In his presentation
Kersjes said that SN has had numerous paper mills cases over the years, across various fields. He shared some examples of manipulation from different fields: image –, authorship –, and Peer Review manipulation. In the mathematical field, this has led to nearly 40 retractions. More recently in the geosciences, 44 articles filled with gibberish have been retracted.
Kersjes said that paper mills are getting more sophisticated, the goal is to spot trends and patterns. New types of paper mills get flagged, automated additional checks in Quality Control have been added, e.g., plagiarism checks. Projects around image integrity have been initiated.
Kersjes added that it is difficult to predict where it will end. He concluded that SN is fully committed to protect the literature but added that more automation may not always be the answer; investing in specialists and streamlining workflows are important too. This is labor-intensive but will lead to more effective improvements to correct the literature on a large scale.
STM wants to take collaboration to the next level with the Research Integrity Collaboration Hub [9]. Trusted algorithms can access content from participating publishers to scan for research integrity issues during the editorial workflow. Two things are critical: (1) publishers maintain control and (2) the infrastructure supports a broad variety of use cases, e.g., image alteration, paper mills, plagiarism, simultaneous submissions, etc. Dr. Koers added that developing this system collectively has many benefits: more agile, lower barriers for smaller and mid-sized publishers, benefits from network effects, and cost reduction.
According to Dr. Koers, STM solutions is ideally positioned to develop this system because it is governed and trusted by members, dovetails work that is already being done and can help foster collaboration with other stakeholders in the ecosystem. In developing the system, they follow an agile, use-case driven approach. Seven publishers are currently participating in the first use case: “simultaneous submissions”. Dr. Koers invited other publishers to join.
Dr. Crotty said the journey to date has been marked by waves of consolidation and during this journey it has become evident that bigger publishers are better able to adapt to changes in the ecosystem. The growth of big publishers is driven by several factors: uncertainty e.g., due to the pandemic, transformative agreements, technology/reporting burden (complex landscape, expensive without inhouse expertise e.g., on metadata), but scale is the most essential component.
He added there is a current shift of publishers becoming workflow providers: the big publishers are building big portfolios of services encompassing all aspects of the research workflow, whereby it seems that most grants are for new developments. Dr. Crotty quoted Kurt Vonnegut: Another flaw in the human character is that everybody wants to build, and nobody wants to do maintenance. According to Dr. Crotty, that is a big problem in scholarly communication too.
He said that the landscape is still dominated by subscription packages and even though change is happening, it is happening at a different pace, geographically and across disciplines. With the structure of the APC Gold OA model and Transformative Agreements, costs of publication are no longer spread among a large number of readers. Dr. Crotty said that institutes with low research output can expect to see a cost-saving, but that research-intensive organizations will see an increase.
Dr. Crotty saw an increased emphasis on publishing in quantity. For the OA long term, this would mean that low-volume flagship journals cost too much. There is a changing attitude from publishers toward these flagship titles, with an emphasis on lowering acceptance standards and publishing more articles over time. According to Dr. Crotty, this means for the long-term future, that there is a drive for low-cost, high-volume bulk publishing. But there is still a demand from authors for high prestigious journals, and Dr. Crotty expected a few will stay in-house but other less prestigious journals will be run by societies.
Dr. Crotty said that this is the path we are currently on: Business organisms have adapted accordingly to OA. But he asked: is this acceptable for long term? We should not assume this is the path we should stay on, or we get stuck in a system where quantity is key to success. Are there other options? Dr. Crotty concluded: At the end of the day, we know where we want to go, what’s the best way to get there?
DAY TWO: Wednesday, 12 January 2022
Prof. Hinchliffe said that all core functions of scholarly publishing (registration, certification, dissemination, preservation) are historically tied to the VoR. She continued with the formal definition of the VoR: A fixed version of a journal article that has been made available by any organization that acts as a publisher by formally and exclusively declaring the article “published”. Hinchliffe stressed that it is key that this declaration is made by the publisher, not the author, funder or institution.
Prof. Hinchliffe explained some potential confusions around the VoR; it is not necessarily the printed version, or most recent version, or the gold OA article. Publishers are posting Preprint PDFs, the VoR is not always peer reviewed. Different article versions exist, this leads to a “record of versions”, and all these versions are eligible for a DOI.
Hinchliffe said that, for the majority of researchers, the article VoR is considered the most authoritative and credible source. They prefer not to use preprints for teaching purposes for example. The VoR is also a funder and rewards structure priority. The VoR is currently often the locus of compliance, e.g., with regard to OA requirements. Prof. Hinchliffe said because the VoR is central to OA publishing models, central to upstream and downstream innovation, it is so important to discuss why the VoR is worth it for the research community.
The subsequent panel discussion started with the question if moving away from the VoR onto various versions would create a sustainable business model for publishers. Prof. Hinchliffe saw the VoR as the primary one version that generated revenue. Dr. Pulverer stated that the VoR could still be the financial modal point and publishers could start charging for services, e.g., PR and OS activities. Dr. O’Connor felt that OA payments are not for the product or VoR: they are for the publishing services. She added that PLOS has a different approach with the Community publishing model and that it is important that publishers think about business models differently.
The discussion continued around the role the VoR still has for career assessments. Both Dr. Pulverer and Prof. Dirnagl agreed that the academic community is conservative and stuck in an ecosystem where too much value is still placed on the VoR. Ways to valorize other outputs should be found and funders and institutes should play a more proactive role in this.
Finally, more issues around trust were discussed. Prof. Behl said that Peer Review is key for quality control and trust. Prof. Ulrich disagreed: he has given up to trust articles just because they have been peer reviewed. He said that research is too complex to have just two reviewers. There is no label for trust; scientists have to make their own judgements. Prof. Sabel said that he does see improvement of the PR process and it should be more embedded into the academic system, e.g., by giving reviewers more credit. Prof. Hinchliffe was of the opinion that quality control should take place across the whole ecosystem. The industry should think realistically about resources, not just about money but also about researchers’ time. Dr. O’Connor agreed and said that trust has nothing to do with the format. It is about transparency and being open about the processes. The output of the PR process does not give an absolute truth.
The session
Dr. Nugent said that the joint commitment discusses the areas in which publishers can collaborate to move forward to understand and reflect the diversity of the research community. In order to ensure action is taken, working groups have been formed, and workshops have been organized, e.g., on author name changes and tackling harmful (historical) content. These workshops resulted into minimum standards, among them: work to define and communicate specific responsibilities and to publicly report on progress on inclusion and diversity in scholarly publications at least once a year. Dr. Nugent said that so far, best practices have been shared, hundreds of researchers have been helped to make name changes, and many publishers have paved the way for diversity data collection.
Roberts said that companies should look at how progress is measured in other areas and the same rigor should be applied to improving diversity. What gets measured gets done, it is the same as any other business problem. Because of a disconnect between what employees want and what actions are taken, companies should carefully consider their strategy, e.g., who should collect the data? What kind of data should be collected? There should be a mix of quantitative and qualitative data, transparency is important. Roberts concluded: With data you can analyze, make predictions and set targets, and benchmark across the industry. Only this way, measurable improvements regarding diversity can be made in publishing.
Schemm said that partnerships such as the Research 4 Life initiative have been fantastic for supporting inclusion. But with the growth of OA, the scholarly landscape is changing and grown more complex. The focus on equitable participation in the research ecosystem should be increased and a more inclusive research ecosystem with the GS should be built. Publishers should acknowledge that researchers in developing countries move from being consumers of knowledge, to producers of research. Schemm added that to foster this development, publishers must examine their own role and take measures to develop equitable OA policies, create editorial boards with geographic diversity, recruit reviewers from low- and middle-income countries and publish robust research from the South as well as the North. Schemm said that to do this, publishers need to work together and create best practices, such as the APC waiver guidelines. Schemm concluded: Nobody can do it alone.
The panel discussion:
Levine-Clark said the current ecosystem is about rewarding publishing papers; the underlying data should be rewarded in the same way. Researchers’ willingness to share data is influenced by the college department, but also by publishers, funders and other factors such as discipline, services, tools and repositories being used. Levine Clark added that it is also important to look at the differences and similarities between universities when looking at data management. To effectively manage data, differences in research activity, full time faculty, number of librarians and professional staff, research expenditures, number of articles published/year and disciplines should be taken into account.
Anderson explained what can be done to accelerate data sharing. Challenges that need to be addressed are about leadership, recognition, policy, data types, preservation, infrastructure, disciplines, sponsors, concerns about data security and scientific openness. He said that, when looking at its historic role, the library would seem like the obvious place for data management. However, historically, content has been published in limited formats. Data may seem similar, but functions like compliance monitor, content analysis, and misuse monitoring, require a redirection of resources within the library.
Russell added that Research Data Management is supported by an increasing number of tools (locally developed and external) and initiatives, e.g., DataCite, Crossref, CHORUS. She explained how CHORUS [11] brings together funders, publishers and institutions to share knowledge, develop solutions, advance innovation and support collective efforts. CHORUS delivers dashboards, reports and APIs, monitors for OA on publisher sites, helps stakeholders to improve metadata, participates in OS pilots, connects datasets to content, and is interoperable with other solutions such as publishing platforms.
The three panelists showed how institutions can make use of CHORUS to manage data more effectively and showed the dashboards for their universities. Levine-Clarke showed how the University of Denver could do better in identifying datasets. Russell said that for the University of Florida dashboard, although the scale is different, the pattern is similar.
The subsequent panel discussion started with collaboration and partnerships on data initiatives. Russell said that at the University of Florida, colleges and libraries have specific roles, e.g., on maintaining information on funders, implementing DM plans, curation assistance to researchers, enhancing metadata and linking datasets. She added that, because of the scale, automated solutions are needed, e.g., to generate notifications when a deadline to make a data deposit is approaching.
Levine-Clarke said they use the CHORUS dashboard to understand author behavior, what can be done to automate the processes as it is very labor intensive. Russell added that one of the challenges is that authors are not using their university email addresses and are not matched with their departments. The use of ORCID iDs should enable automated identification.
A Menti poll showed that four publishers in the audience are tracking authors’ compliance with OA requirements, and five responded negatively. Anderson said to look at the publishers’ incentives to track author’s compliance. Levine-Clarke added that there is a need to build an infrastructure around connecting funders, article, data; if there is an incentive for publishers to track compliance, this might increase. Campbell added that in SN’s submission system, it is possible to enter the FunderID but most authors don’t use this option.
Campbell posed the question how the different libraries are addressing the challenges of understanding multiple funders’ requirements? Russell said that at the University of Florida, they document requirements of various funders, assist authors with compliance, use tools to identify publications, encourage use of ORCID iDs. Anderson said that at Brigham Young University, the library has not been involved in the relationship between funding and research. Levine-Clarke stated that at the University of Denver, more needs to be done, in general to help understand publications and data linking. Standards and infrastructure to connect different data types should be implemented to make processes easier. Campbell concluded: for Data Management to be part of the culture and standard, we all need to work together.
Dr. O’Connor said that instead of talking about an article, we should talk about a linked digital knowledge stack, and to enable this multiplicity of outputs, metadata quality should be improved. The decision on what is valuable shifts more from the creator towards the user. Selectivity needs to be redefined, and the focus should be on the research question.
What does that mean for the VoR? Dr. O’Connor thought that articles will have value in the future, but they should not be outvalued over other outputs. She pointed out that many current publishing systems and business models are built on the VoR. This hampers research advancement. In collaboration with libraries, consortia and funders, publishers should move beyond the article and APCs and develop new business models. Selectivity does not have to mean that costs are enormous. Dr. O’Connor said that in order to move to a system that is truly open, the change has to be radical.
DAY THREE: Thursday, 13 January 2022
The panel discussion
Worlock kicked off the session by asking the panelists about their drive to innovate and invent. Madisch said that he and ResearchGate’s cofounding team shared the same drive to reinvent – not disrupt – the status quo, to invent ways in addition to old ways to move science forward by using technology. Benchekroun added that they saw the problem of online conferences and with Morressier they had the drive to wanting to change something and improve online conferences.
When asked about how to avoid being sidetracked from achieving the main goal, Madisch’s advice was to do what you want to do, and not to think too much of what you need to do. He added that even though at ResearchGate the scientist is in the center, they are also thinking about the future, e.g., how can RG help scientists to become more productive? Madisch said that new challenges and problems do come up while working out the main problem. Benchekroun added that it is almost inevitable to solve different problems, when tackling the main problem in this area. To bring offline conferences online, for example, so many steps are involved; you need a holistic approach.
The discussion continued on the risk of starting a start-up. Madisch said that he does not think in risks but in opportunities; that’s how you become an entrepreneur. He added that he underestimated how you monetize, from an entrepreneur led organization to a more professional organized enterprise. You must think how you want to monetize without sacrificing your product. Change culture, change leadership, and ask yourself if you are still the right CEO. Benchekroun added that a lot of startups forget to innovate themselves, even though the market develops further. An innovation mindset is important, as well as keeping the conversation going with scientists, society, and publishers.
Worlock asked about the route for growth. Is it generic, or through collaboration, or acquisition? Benchekroun replied that within the scholarly communication community, there are a lot of opportunities to grow. There is a clear need in scholarly communication to collaborate but also to grow from within. He said that not enough is being done. When there would be more start-ups within the sector, the speed of innovation could be brought up to the speed of change. Worlock said there might not be more start-ups because growth rates are not as fast as other markets. Madisch said that to be ready for growth senior leaders and non-scientists should be brought in for fresh ideas. He said the problem with scholarly communication is twofold. First, the fact that growth is not big and interesting enough for investors. Secondly: Pseudo innovation. Transforming offline to online is not real innovation. New ways of working need to be created and it is crucial to get closer to the creation of research.
The session
With Cassyni, it is possible to engage directly and globally with the community and increase awareness and impact of a journal’s research. For existing seminars, it is possible to save time, reduce complexity and improve experience. Preston added there are lots of additional features, such as DOIs for the seminars, and different use cases: journal seminars, university department and research group seminars. Cassyni’s vision is to build a rich diverse and inclusive ecosystem. Since the public launch in September 2021, Cassyni has been picked up by the community showing that it is a product that fits well into the research ecosystem.
Clients use the Nested Knowledge platform to organize, visualize and analyze research while compressing timelines and improving outputs. Pharmaceutical and device companies utilize the NK reviews for FDA interactions, CER literature reviews, and other purposes.
Kallmes concluded his talk by stating there is a need for a mindset change, towards adopting the living knowledge paradigm. With the NK software, living reviews are possible, people will have access to the most up-to-date knowledge, and it will enable publishers to move forward to a more dynamic and systematic paradigm.
CiteAb is expanding on data collection, with a focus on quality. They moved from being a search engine to a data provider. Data around the products has value for reagent suppliers, it informs their strategy and help drive sales.
Chalmers said they are just scratching the surface of what can be achieved. There is more data to collect, and researchers and suppliers that can be helped. CiteAb is also exploring partnerships with publishers. Potential benefits are to access novel content and revenue, collect data from publications and then return with added value.
Pilloxa offers different solutions: one of them is the patient-facing mobile app program. Clients, like university hospitals, can tailor-make the app to support their patients during their treatment. This can for example include sending information, e.g., journal articles at the right time. But also, to provide patients with information on medication and motivation tools. This can be done through add-on smart hardware. A university hospital can for example learn how to better equip patients to motivate them to stick to the treatment plan.
Mazzotta said that Pilloxa aims to make complex issues simple and fast, whereby the patient’s needs are put first. By bringing information together, this will accelerate learning and knowledge creation.
Aipatents enables a content-based search: matching is done on all content, not on keywords. It uses a unique Natural Language Processing approach for analyzing long text documents. It gives comprehensive ranked results, sorted by relevance, and connects different types of documents: patents, standards, literature, product information, etc.
AiP has worked with leading integrators in the patent space and enables diverse applications. A PubMed search query showed that AiP is able to find highly relevant articles. It is not only able to identify the articles but can also find articles that contradict, with other claims. Belinson concluded: Big data is here to stay and being able to connect different types of data is the future.
All panelists replied positively to the question if they would be willing to collaborate with publishers. Belinson said it is about adding more value and insights for researchers and finding the right home for the technology. Mazzotta agreed: by working with publishers, we can help spread and build knowledge, improve health literacy, and make sure information is accessible at the right time. Preston said that Cassyni is already working with publishers in organizing seminars; it is an opportunity to build relationships with the community. Kallmes said that journals could adopt the living review paradigm and move forward with enabling living documents. Challmers said that collaboration would be in both the publisher’s and scientist’s interest, data can be used to help scientists read their papers.
At the end, the audience was asked to vote for the different dotcoms by answering questions: which of these dotcoms would you like to collaborate with, in which dotcom would you invest your money, in 3 years who will be the most successful dotcom? At the end Smit declared Nested-Knowledge.com as the winner.
Auf Wiedersehen! Goodbye!
Eric Merkel-Sobotta closed APE 2022. He thanked APE’s founder Arnoud de Kemp, the program committee, all speakers and sponsors, and extended a special thanks to Marta Dossi (Director of Operations, BISP), without her the APE Conference would not have taken place.
Please note: APE 2023 will take place 10–11 January, 2023.
