Abstract
This paper explores the dynamics of algorithmic governance, decision support systems and human involvement in the context of plagiarism screening in academic publishing. While automated plagiarism screening is widespread in editorial work, critical investigations about these decision support systems remain scarce. Focusing on the issue of human autonomy and discretion in algorithmic governance, the paper investigates the complexities of the human-in-the-loop within these screening tools. Revisiting Wanda Orlikowski's conceptual metaphor of ‘scaffolding’, the study empirically analyses interactions between editors and plagiarism screening software. It traces how these tools act as scaffolds, defining plagiarism as a manageable problem while allowing editors considerable flexibility in decision-making. The software, which is both non-deterministic and powerful, transforms issues into potential decisions, shaping the decision space for human editors. Based on this investigation of screening software as scaffolding, the paper argues that the question of human involvement in systems for automated decision-making is somewhat beside the point, and that analytical attention should shift towards understanding how algorithmic systems configure decision spaces by establishing issues as decidable problems. The implications of this shift are discussed, emphasizing the need for advancing our understanding of power dynamics inherent in algorithm–human interactions within automated decision-making systems.
Keywords
Copywritten – So don’t copy me Y’all do it Sloppily And y’all can’t come Close to me (Melissa ‘Missy’ Elliott & Timothy Z. Mosley)
Introduction
With the current ‘Great Plagiarism Debate’ (e.g. Schorn, 2024), plagiarism and its detection have (once again) become focal topics in academic and political discussions. In this debate, some vocally advocate for automated and Artificial Intelligence-based detection methods as the tools that will finally allow us to identify all instances of plagiarism, beyond a doubt, everywhere (e.g. Ackman, 2024). In academic publishing, digital tools to detect text overlap – often called plagiarism scanners – are already a standard instrument. Used by a majority of publishers and journals (e.g. Michaud, 2023), they represent means of ‘algorithmic governance of academic writing at an unprecedented scale’ (Introna, 2016: 15) that screen a sizable portion of the published academic literature each year. Contrary to the current AI enthusiasm, however, this automated screening software has often been criticized by academics (e.g. Weber-Wulff, 2019), who frequently worry about questions of how they might diminish human autonomy and control when ‘[w]e’re letting bots make autonomous decisions to reject scientific papers’ (Bonnefon, 2019) and how this might impact the quality and fairness of decisions.
Human involvement in automated decision-making, often discussed as human-in-the-loop, is also at the centre of wider academic debates on algorithmic governance (e.g. Cabitza et al., 2023; Jones, 2017; Katzenbach and Ulbricht, 2019; Wagner, 2019). Theoretical debates often emphasize the need for human involvement to protect the fairness, accountability and legitimacy of decision-making (Hildebrandt, 2018), while empirical accounts of human–machine relations in decision support systems in domains such as the criminal justice system (Morin-Martel, 2023), the welfare state (Saxena et al., 2022) or clinical settings (Kempt et al., 2023) reveal the complexity and messiness of actual human involvement. This involvement is often fraught with uncertainty (Kempt et al., 2023) and can result in highly variable levels of meaningful human discretion (see also Wagner, 2019). Thus, the precise role and effects of humans in the loop in algorithmic governance are thus far from clear. To fully grasp the realities of human involvement in algorithmic systems it is crucial to develop a framework that can capture both the limiting effects of algorithms on human discretion and the sustained latitude for human autonomy within algorithmic systems.
This contribution sets out to develop such a conceptual framework by revisiting and extending Wanda Orlikowski's (2006) conceptual metaphor of ‘scaffolding’. Using the deployment of plagiarism screening software in editorial work as an empirical case, it will argue that the role of decision support tools can be best described as the scaffolding of a decision space. By tracing the interactions between editors and the screening software, the analysis will show how in scaffolding decision spaces for editorial decisions, the software establishes and enforces plagiarism as a decidable problem with specific, fixed parameters, while at the same time leaving editors with considerable leeway for individual decision-making. As such, the software is at once non-deterministic, enabling human discretion rather than restricting it, yet still highly powerful in setting issues that editors must then decide upon. Its power effects are less in forcing humans to make specific decisions, but rather in turning issues into potential decisions and humans into decision-makers. This, it will be argued, raises the question of how the particular decision space for humans is configured by software. Finally, the contribution will discuss the implications of such a shift from concerns with human involvement and discretion to a conceptualization of algorithms as scaffolding decision spaces for critical analyses of algorithmic governance.
Theory: Algorithmic governance, human in the loop and scaffolding
In recent years, algorithmic governance has become the object of a large and vibrant literature (e.g. Gritsenko and Wood, 2020; Katzenbach and Ulbricht, 2019; Musiani, 2013; Peeters and Schuilenburg, 2023; Waldman and Martin, 2022). It can refer both to the governance of algorithms by regulatory frameworks, legislation and other comparable institutions, as well as to the governance by algorithms (Musiani, 2013; Ulbricht and Yeung, 2022), that is to the large-scale power effects algorithms can have on individuals as well as social collectives via surveillance and social ordering (Issar and Aneesh, 2022; Katzenbach and Ulbricht, 2019). This paper focuses on the latter, investigating the governance practices that unfold with and through algorithmically powered software tools.
In both academic as well as political debates about algorithmic governance, its effect on human autonomy, agency and discretion has become one of the central concerns (Katzenbach and Ulbricht, 2019), with a specific focus on the (potential) restriction of human agency and discretion in algorithmic systems (e.g. del Gritsenko and Wood, 2020; Katzenbach and Ulbricht, 2019; del Valle and Lara, 2023; Wagner, 2019). In this context, critical discussions emphasize the adverse effects that the absence of such a human-in-the-loop can have in terms of fairness, accountability and legitimacy (Binns, 2022; Hildebrandt, 2018; Waldman and Martin, 2022; see also Jones, 2017). Furthermore, concerns have also been raised about systems that do feature human involvement, but in which algorithms dominate over human discretion and expertise (Cabitza et al., 2023), effectively restricting human discretion to rubber-stamping automated decisions (Leese, 2019; Wagner, 2019).
However, as many scholars emphasize, algorithmic governance is not solely and autonomously enacted by algorithms, but rather achieved by complex socio-material assemblages involving algorithms and other technological infrastructures, legal frameworks, political and cultural contexts and, importantly, human decision-makers (e.g. Introna, 2016; Shestakofsky and Kelkar, 2020; Ziewitz 2016). Existing empirical research calls attention to the complex and intricate socio-technical relations that are involved in algorithmic systems such as content moderation on social media platforms (Gillespie, 2022; Lai et al., 2022; Rieder and Skop, 2021) or search engine evaluation (Meisner et al., 2022). Moreover, even when subject to seemingly controlling systems for algorithmic governance, humans can still engage in a range of different strategies such as buffering (Christin, 2017), meaningful disagreement (Kempt et al., 2023) or appropriation (Bruni et al., 2022) that protect their agency and discretion at least to some degree.
To understand the effects on human agency and discretion, as well as the larger power effects of algorithmic governance, it is thus paramount to develop a theoretical framework of socio-technical relationships that is faithful to both the limiting and coercive effects that algorithms can have on human decision-making, as well as to the realities of sustained human agency and discretion in automated decision-making. The present contribution suggests revisiting Orlikowski's (2006) conceptual metaphor of scaffolding. Drawing on Clark's (1998) idea of scaffolding as an augmentation that enables humans to achieve things they would otherwise not be able to, Orlikowski (2006: 462) understands scaffolds to ‘include physical objects, linguistic systems, technological artefacts, spatial contexts and institutional rules – all of which structure human activity by supporting and guiding it, while at the same time configuring and disciplining it.’ Scaffolds thus enable and configure human agency, while also being enacted by it: ‘these scaffolds don’t exist outside of knowledgeable human practice; they are “performed” by human agency’ (Orlikowski, 2006: 462). Importantly, even if understood as an augmentation, scaffolding nonetheless affects and shapes human action, including by posing boundaries on human action and discretion (Galliers, 2006).
Focusing on this metaphor, we can understand the role of technology in (semi-)automated decision making as the scaffolding of a decision space. There are two aspects of such a revisited metaphor of scaffolding that make it particularly productive for analysing socio-technical interactions in decision support systems. First, scaffolds are built with very particular uses in mind and have rather specific boundaries (Galliers, 2006). Algorithmic tools in decision support systems tend to be highly specific: they are fitted quite tightly into pre-structured decision procedures and are tailored to achieve quite narrow aims. This distinguishes them from other software systems that are more open to situated interpretation, whose functions are active accomplishments of groups of users (Vertesi, 2019): while software such as Microsoft PowerPoint can be used to plan a NASA Mars mission at the National Aeronautics and Space Administration (NASA) (Vertesi, 2019: 373 ff.), but also to plan a picknick or to show a heartwarming slideshow at an anniversary, algorithmic tools in decision support systems have a lot less fluidity (deLaet and Mol, 2000), they typically do not adapt very well to different contexts and they do not enable a very large variety of uses. By scaffolding a clearly delimited space for decision-making, algorithmic tools condition and shape what human actions and decisions are possible by defining the types of questions and problems about which decisions can be made, and by enabling some ways of making decisions while not enabling others. At the same time, within this delimited space for decision-making, the technology is still generative and non-deterministic – scaffolding does not determine exactly how a building will look like. Scaffolding allows humans to move around its perimeter freely, to perform a plethora of different, and unforeseen, activities, to bring in additional tools, and to construct almost endless variations of buildings in the process. In scaffolding a decision space, algorithmic tools provide a basis for human discretion.
Second, the whole point of scaffolding is to enable building something durable that will stand on its own, making the scaffold obsolete in the process. As such, they are a technology with a distinct temporality, whose effects will prevail long after their use has ended. Capturing this long-term after-use perspective is what differentiates the metaphor of scaffolding from concepts such as affordances (David and Pinch, 2006; Gibson, 1977) that focus on the effects of technologies while they are being used: while the affordance of a chair allows humans to sit or to rest their feet on it, it does not enable them to assume these positions permanently, even after it has been removed, and neither does it aim to do that. Algorithmic tools in decision support systems function as scaffolds by enabling decisions that are stable and durable even after the use of the tool and the decision process have concluded. The effects of decision support tools thus extend beyond their period of use. Using the metaphor of scaffolding then draws our attention to this distinct temporality and to the ways effects of algorithmic tools are laid out during use but may only become fully manifest afterwards.
Given this focus on the scaffolding of decision spaces in investigating plagiarism screening in editorial offices of academic journals, the goal of the following analysis is twofold: first, it will use the concept of scaffolding (Orlikowski, 2006) to analyse human–machine relations in decision support systems. It will trace how this scaffolding plays out in the interactions between the editors and the screening software along a temporally structured workflow. Second, the analysis will then refine this conceptualization of scaffolding in decision support systems by investigating the larger power effects of scaffolding decision spaces. It will show how the power effect of screening software lies in establishing plagiarism as an object of editorial agency and decision-making, thus generating editorial discretion while at the same time intervening powerfully in editorial work and academic publishing.
Data and methods
The case: Similarity check
The software SimilarityCheck, developed in a partnership between the software company Turnitin and the publisher initiative Crossref, is available to all Crossref members for a fee. The tool is very widely used (see also Hesselmann, 2023): crossref reports that 2,419,612 documents were screened via SimilarityCheck between January and July of 2021. 1 Although Similarity Check is widely available, its implementation can vary between publishers and even between journals from the same publisher. SimilarityCheck screens a submitted document against a reference corpus of texts that comprises (a) a full-text corpus of all CrossRef-indexed literature, (b) CrossRef posted content, which refers to documents such as preprints, reports or working papers that are registered with CrossRef but not formally published, (c) a corpus of texts, e.g. web pages, crawled from the internet and (d) a corpus of full-text publications from sources other than CrossRef.
Data collection for this study was conducted between August 2021 and July 2022, prior to the release of ChatGPT-3.5 on November 30th, 2022. The availability of generative AI undoubtedly has and will continue to have a radical impact on academic publishing in general, and the issue of plagiarism in particular. With tools like ChatGPT, it seems unlikely that verbatim plagiarism will continue to be an issue, since it is now easily possible to have an AI rewrite texts or even generate entire publications without risking detection by copying other publications word for word. Studying plagiarism detection now may seem a bit anachronistic. However, the prementioned public debates highlight that plagiarism, at least for now, continues to raise concerns. More importantly, this case study of plagiarism lays crucial groundwork for understanding automated screening tools in the handling of research integrity. In particular, SimilarityCheck can be seen as a blueprint for the future detection of auto-generated content in editorial offices: in November 2023, Turnitin launched iThenticate 2.0, which now includes a feature for the detection of AI-generated writing. As such, there is a direct connection from plagiarism screening pre-ChatGPT to screening for auto-generated texts, making this case study a valuable building block for understanding socio-technical interactions in decision support systems for research integrity. In the version analysed here, SimilarityCheck does not contain AI writing detection, and in itself can be considered a rule-based expert system, not an AI-system, although more AI-based features such as paraphrase detection are currently also being developed.
Data and methods
Researching software tools for plagiarism screening in editorial peer review presents a dual challenge: first, software as a phenomenon is not only complex but also highly opaque (Christin, 2020; Seaver, 2017). In the present case, SimilarityCheck is a proprietary, commercial product for which data such as code or insights into the development process are not publicly available. Second, data on the inner workings of peer review are also notoriously difficult to obtain (Hirschauer, 2010) due to confidentiality concerns, especially since this study addresses integrity issues that are typically seen as particularly sensitive. Data access was therefore quite challenging. This investigation uses a mixed-methods approach, combining different types of data.
The study primarily draws on nine virtual and one on-site demonstration sessions in which the editorial staff walked the researcher through their usual workflow with the software tool (Herrmann et al., 2007; Polson et al., 1992), explaining their decision-making process as they went along. Participants were selected to reflect a broad spectrum of disciplines, editorial positions, as well as publishers to allow for a wide exploration of the field. While this sample limits generalizability, it still provides valuable exploratory insights into an important but understudied area of academic publishing. Overall, the sample consisted of three members of an in-house publishing integrity team (all Life Sciences), three academic editors (one Engineering, two Social Sciences and Humanities), three professional editors (one Natural Sciences, two Life Sciences) and one managing editor (Life Sciences) from altogether five different publishers, including one Open Access and four traditional publishers. The software is offered to editorial staff by most of the major publishers, with some journals screening all submitted manuscripts automatically and others requiring editors to trigger the screening process manually. However, whether editors actively engage with and use the screening results remains an open question. To reflect different levels of engagement, the sample consisted of nine participants who were regular users of the software, albeit to different degrees, and one participant who did not actively engage with the software but worked at a journal that automatically screened all submitted manuscripts. The sessions took about 67 min on average, ranging from 93 min to 19 min for the editor not engaging with the software. The virtual sessions included screencasts of the software's interface and were recorded via Zoom. The on-site session took place in the editor's office at a German university and was audio recorded. All audio recordings were transcribed and analysed using a content analysis procedure (Mayring, 2010). The content analysis proceeded along the main themes of the session, that is, respondents’ conceptualization and evaluation of plagiarism, their work with the editorial management system, their use of different features of the plagiarism screening software, the typical steps following the screening, the role of the publisher, as well as respondents’ views on the necessity and appropriateness of automated screening.
Additionally, screencasts and the software interfaces were analysed using visual semiotics (Aiello, 2019). To gain additional insights into the features as well as the development process of the software, key informant conversations were held with an iThenticate product manager as well as two CrossRef representatives. Furthermore, the policies, guidelines and user handbooks for the software issued by the publishers in the sample were collected to better understand the normative background for the use of the software, even though the interviews with editors showed that most of them hardly took notice of those policies. These data were then all combined to reconstruct editors’ actual workflows with the software as well as their interpretations of various interface features.
Results
Three stages of workflows
All journals in the sample have the screening software integrated into their editorial management systems (EMS). Different EMS allow for different levels of integration and automation, and they structure the interaction between editorial staff and the software by providing different visual interfaces. However, much of the structure of the workflow remains similar across EMS: in general, software use follows a temporal workflow with three different interactional stages that unfold over time (see also Orlikowski, 2006: 464): in the first stage, the software screens the manuscript and presents editors with a ‘Similarity Score’ showing the overall percentage of overlap with other sources. Clicking on this score leads to the second stage, where editors access a detailed ‘Similarity Report’ displaying a page-by-page breakdown of text overlap, including links to identified sources and various options. The third stage involves actions beyond the report, such as contacting authors, discussing results and deciding to proceed with peer review or to reject the manuscript. Each of the three stages involves specific modes of interaction between the human users and the software, and each differs in terms of the roles and discretion they scaffold for the editorial staff. As editors progress through the stages, they transition from heavy reliance on the software to more autonomous decision-making, demonstrating the evolving scaffolding dynamics over time (Orlikowski, 2006): while in the beginning, scaffolds are essential for supporting a structure or building (here: an assessment of the overlap in a paper), as the structure becomes more stable and self-supporting, they are needed less and less and finally can be dismantled while leaving the structure intact.
Stage 1: Selection
Every manuscript that's submitted goes through an iThenticate
2
check. So, whenever we’re handling any paper, we see, you know, the aggregate similarity score. And, and so, quite often we didn’t… You know, if the similarity score didn’t look too high, we wouldn’t take… You know, do any further analysis. We were just happy that it, you know, there was a reasonable similarity score. Sometimes if the similarity score looked, you know, a little bit, you know, towards maybe the 40, 50%, we were like, hmm, that seems a, that seems a bit high. And then we opened up the report […] (Professional Editor Life Sciences I)
In this first stage, human–machine relations are mainly structured around the quantitative Similarity Score, a number that indicates the percentage of overlap between the screened submission and the corpus of texts against which it was screened. In two of the three EMS represented in the sample, all manuscripts are screened automatically upon submission. When an editor logs onto the editorial management platform, new submissions will have already been screened and their Similarity Scores are displayed in a dashboard along with the manuscript's other metadata. The way these scores are displayed in the dashboard is also typically quite suggestive, featuring different ways of color coding to indicate whether the score falls below or above a certain threshold. In one system, the number automatically changes color: dark grey if the score is below the threshold, bright red if it exceeds the threshold. With red as a color that conventionally signals danger or a violation, the editors are strongly encouraged to ‘do something’, that is, to click on the respective link that opens a more detailed screening report and thus initiates a more thorough investigation. Another system employs a color-coding scheme resembling traffic lights (green, yellow and red) to indicate whether a score is within the acceptable range (green), should be treated with caution (yellow) or seems so high that the manuscript review should be halted until further investigation (red). In a third system, the screening process is not triggered automatically for each submission but needs to be initiated by editors manually. Once screening is completed, the editor is informed by an email alert about the results, which again highlights whether the score has surpassed a certain threshold: I’ve set it up so that I’m alerted to say when the report is ready. And it tells me the results of the report. X percentage similarity. And then I’ve set up another system as a flag to say when it's exceeded 30%. So generally anything below the 30% threshold is generally okay. Anything that's over 30% needs looking at. (Managing Editor Life Sciences)
In this stage, editors’ only information is a combination of the overall Similarity Score and the interface's indication of whether it exceeds a certain threshold. The interfaces strongly guide editors towards a decision rule of no action for scores below the threshold and further investigation for scores above it (with one interface adding a middle category in between). This limited discretion results in an editorial decision-making process that approaches quasi-automation, as decision rules remain highly similar whether enforced by humans or algorithms. This is also the workflow suggested by the majority of publisher policies. In this first stage with its focus on a single quantitative score and the limited options for action it provides, human involvement in fact seems to be rather ceremonial, as the use of a single quantitative figure representing screening results narrows the interpretative possibilities (Introna, 2016: 18, building on Miller, 2004). The very sparse information for decision making provides little room for editorial staff to develop alternative interpretations, employ alternative ways of reasoning or to arrive at alternative options for further action. Most of the discretion, then, seems to be taken away by the screening tool.
However, as limited as this stage is in terms of human discretion, we can still see various ways in which editors exploit and perhaps expand the limited discretion they do have: technically, editors in all systems can override the systems’ (suggested) decisions. In two out of the three systems, editors are technically free to ignore the screening altogether, as well as to ignore the suggested path of action: they can open any report for investigation, or they can ignore any score and still move the submission forward. Empirically, many editors report using more flexible ranges rather than fixed thresholds, and almost all editors report using different thresholds than those suggested in the interface. Interestingly, those thresholds inscribed in the interface do not correspond to the thresholds given in some of the publisher policies, but rather seem to be specific to each journal. Editors’ thresholds also seem to be rather variable: a few times, editors’ accounts of what thresholds they usually employed changed throughout the interviews, seemingly without the interviewees even noticing. One editor also reported more or less ignoring the scores and investigating all manuscripts in more detail, thus effectively skipping the first stage and moving right into the following stage. These systems then neither automatically enforce text overlap thresholds, nor do they seem to be very successful at coaxing editors into following their specific thresholds. Even in the third system, editors also always have an override option, enabling them to clear manuscripts that were halted as well as to halt manuscripts that were automatically processed.
In this stage, the decision support system scaffolds a specific decision space, allowing editors to set their own thresholds, use flexible ranges or even assess each Similarity Score individually. The screening software does not dictate decisions or force editors’ hands. In fact, whether a submission gets cleared or selected for further investigation will probably vary greatly from time to time and show a great deal of idiosyncrasy, even though it passes through the same software. However, this does not mean that the software is inconsequential or that human editors are entirely autonomous in making their decisions. On the contrary, even when editors apply their own thresholds, they remain bound to interpreting a single quantitative score. Approaching plagiarism as a question of quantities – percentage of overlap – and thinking with numbers is thus something the editors cannot avoid. The software establishes a very limited decision space, and in doing so defines plagiarism as a quantitative concept. Alternative ways of thinking about plagiarism, but also other ways of reaching decisions, are excluded from this space. The software's effect, then, is less in predetermining a particular decision outcome, but much more in establishing the underlying parameters for decision-making.
Stage 2: In-depth analysis
The second stage is only initiated if editorial staff decides to investigate a case in more detail and manually opens the Similarity Report, thus strongly relying on human enactment. The Similarity Report has a much more interactive and dynamic format than the display of the previous Similarity Score in the dashboard, and it supports much more varied use strategies. If the first stage centres around sparse visual communication via a single, seemingly unequivocal, number, the second stage opens an interface with an almost overwhelming plethora of visual features, colours, symbols, numbers and interactive elements. In general, the report resembles a regular document viewer: the left side displays the original manuscript, highlighting overlapping text sections in various colours, while the right side features a menu listing sources with matching color-coding, indicating word overlap and percentage values. In contrast to the first stage with its alarming red number for high Similarity Scores, the bright color-coding in the report is much more benign, even cheerful. Other than matching text segments to sources, the colours themselves seem to have no further meaning, and are rather aimed at encouraging software use by making the interface more fun or visually appealing (see also deLaet and Mol, 2000: 228). There is also an interesting multiplication of numbers: the Similarity Score is still displayed somewhat prominently in the upper right-hand corner, but it is joined by multiple other numbers: each individual source is listed with a color-coded number that identifies it and at the same time denotes its rank, from largest to smallest overlap. Additionally, the interface also shows the percentage of overlap for each source, as well as the absolute number of overlapping words for each source. Almost everything in this interface is clickable, giving users the option to toggle between different views, activate different pop-up windows or be taken to external websites via further links. Overall, the interface implies a high level of interactivity, with its bright colours and clickable elements that almost introduce a certain playfulness. Through the multiplication of numbers displayed for each source, as well as the different layers of interactive buttons, expanding lists, pop-up boxes, hover-over text and links that lead outside of the tool to web sources and/or back to features within the tool, plagiarism is now presented as a highly complex, layered and also somewhat unstable concept. In stark contrast to the unequivocal and visually limited display of the Similarity Score in the first stage, the Similarity Report thus affords editors with numerous opportunities to experiment with the interface and to come to their own conclusions, which also manifests in the way editors use the Similarity Report. Most editors describe how they get an initial overview by scrolling through the report: So yes, here we can also scroll through quickly. So this is what you typically do. You scroll through quickly, you look, does anything pop up tremendously? You can see here, in the Results and Discussion [section of the manuscript] this is, a few, yes, smaller phrases somehow. I mean, that is also obvious, authors have their standard phrases that they like to re-use occasionally. All of that is of course not an issue at this point. (Professional Editor Astro Physics) I’m really wanting to see page-by-page, where I’m seeing these blocks of [highlighted] text. And the type of text that it is. Is it a method section? Is it a complex technological section that can’t really be reworded? Or is it something that can? (Integrity Team II)
Here, it becomes obvious that the software was not developed from within or for a particular discipline. Originally designed to be used in education, and then subsequently being marketed in scholarly publishing, the software incorporates various conventions for academic writing in a mix-and-match fashion. In its default settings, the software's flags are not representative of any coherent disciplinary writing standards. To make up for that, the tool relies on users to create a coherent, discipline-appropriate assessment of text overlap: by going through the sections highlighted in the screening report, editors develop their own interpretation of the text reuse in the manuscript. The tool, in turn, enables this by a range of functionalities: most prominently, the software offers the option to manually exclude sources, such as preprints, from the results. This is something most editors report using frequently. Excluding a source will remove the color highlighting on segments that overlap with that source, and reduces the Similarity Score. In addition, editors can click through to a menu titled Filters and Settings, which provides further options for excluding different types of matches: And then you can, you can filter the results. So at the moment this is excluding bibliography. So generally I exclude bibliography and quotes, ‘cause it's like, well, that's gonna be, um, you know, similar. And then I’ll put, exclude matches that are less than… less than a 100 words… And you can exclude the materials and methods altogether […] You have to, sort of, play around with the threshold levels. And work out, you know, what, what bit is gonna give you more or less, kind of, hits. So here, for example now, I’ve excluded the materials and methods, sort of, hits. So all this part now isn’t being highlighted. (Managing Editor Life Sciences III)
However, this reconfiguration also has its limits. This becomes particularly obvious when looking at the criticism some editors raise, which go beyond the points that can be easily addressed by the software's options: You can also put a filter here, it is not active right now, exclude bibliography. Then this part would not be included in the analysis at all. But that will work, I think, only if it really says the word References, or Bibliography, or something. Not entirely perfect with this system. And most importantly, sometimes you have other things after the References such as Figure Captions or other parts, and they would get lost then, too. There's pros and cons to this. So there are a couple limitations to this system. It is obviously not perfect, either. (Professional Editor Astro Physics) But here you can see for example, the spelling changed a little bit. And then it is not recognized that this is the same as this. So for example here is an additional space character and there as well. And then this is simply not recognized. And something like this the system simply does not find. (Academic Editor Engineering)
At the same time, there is also room for creative use strategies that provide options for sense-making beyond the original scope of the software. While the majority of editors use the tool in more or less the way just described, without introducing radically new use practices, the data reveals two significant cases in which editors use the screening results in quite different ways to beyond assessing text overlap. In the first case, the editor demonstrates how they utilize flags of overlap in the reference section, which is commonly seen as a flaw of the software, to identify other integrity issues such as citation manipulation: So, iThenticate, in this case, iThenticate hasn’t really identified any issues in this paper, but what it has done is, it's highlighted that some subsequent papers have got large chunks of citations that have been… They, they might be copy and pasted or they might be… […] So, yeah, that's, that's one way that, that iThenticate can be used to try and identify patterns of citations, common citations, that might be an example of citation manipulation. (Professional Editor Life Sciences I) You learn for example, sometimes, if you really put in the effort and really reconstruct the low percentages, the hints to the one percent sources, then you also get an insight into the, how should I say, into the dynamics of an author's development. So, which authors was he (sic!) working with. Which sources did he use. Which texts, and so on. That can be quite intriguing. […] Also, in a way, you can reconstruct an author's areas of interest. What did they find important or relevant enough to cite. Which authors, which writers and so on, from which sources and such. (Academic Editor Social Sciences)
Here, editors focus on aspects of the screening results that are typically considered irrelevant or even problematic and use them to produce knowledge for their decision-making in unexpected areas. In the second case, the editor even manages to utilize the tool for a different purpose, one that does not entail monitoring the submission for (additional) integrity violations such as citation manipulation, but rather aims to better understand the text from a more hermeneutic perspective. As such, this repurposing radically shifts the normative implications of the tool. Although the software may have been built with particular uses in mind, it nonetheless also scaffolds and enables human action beyond those intended uses, highlighting its non-determinism.
In the second stage, editors’ interpretation of a manuscript's plagiarism status is more nuanced compared to the first stage's reliance on a single quantitative score. While the first stage provided only sparse scaffolding, limiting human sense-making, the second stage offers a broader scaffold for discretion. Despite the expanded room for interpretation, the software's role remains the same – scaffolding a specific decision space. Editors’ interaction with the Similarity Report reveals the software's non-deterministic quality: editors have considerable room for individual sense-making and discretion, they have substantial influence over the results and they can develop creative and unforeseen use strategies. At the same time, editors’ sense-making is still bound by the software's limitations. The software sets a number of general parameters, such as an emphasis on quantitative assessment of text overlap via word counts, as well as technical limits as to what kind of overlap can be counted. The software sets up a particular space for decision-making, within which editors can exert a lot of discretion, but which they cannot easily circumvent (unless they stop using the software). While editors are very much free to come to their own decision regarding an individual manuscript, the general conditions of how this decision can be reached are limited by the software.
Step 3: Reshaping texts
After evaluating the Similarity Report, editors decide on the manuscript's fate. If the flagged overlap is deemed acceptable, the manuscript advances to the next peer review stages. For manuscripts with significant text reuse issues, editors may opt for desk rejection, citing unacceptable reuse as the reason for rejection. Many papers, however, fall into a middle category of problematic but amendable overlap and are not desk rejected. Rather, editors often contact authors to ask them to rework sections that they deem problematic: And so we don’t, kind of, go, kind of, accusatory on the authors. We’ll, kind of, tell them, look, you know, we’ve identified high similarity in the text in these, these, these regions. Please can you rewrite them and, you know, either, um, put method as described previously by X Y Z or…Or, you know, put that with minor modifications, kind of thing. Just ask them to rewrite it so, so we’re, kind of, trying to educate authors rather than, you know, tell them, no, must be rejected, that kind of thing. (Managing Editor Life Sciences III)
This generative use transforms the software from a tool supporting decision-making to one shaping specific writing practices. Arguably, this active intervention amplifies the governance effects of the software, granting it considerably more influence than as a mere filtering tool. At the same time, however, it is also where editors develop the most agency in producing their own interpretations and courses of action. In their interaction with authors, editors can now move beyond many of the limits of the software and develop meaningful, that is, actionable interpretations that the software could not act upon. To revisit the example of the software's inability to detect text overlap if the spelling is changed ever so slightly, this issue can now be resolved: in their interactions with authors, editors can now flag paragraphs to the authors that the software has not identified and ask them to the address the issue, making these previously unaddressable paragraphs actionable. Interacting with authors can thus serve to somewhat mitigate the limits of the software.
Staying with the metaphor of the scaffold, editors gradually moving away from the software infrastructure represents the stage in which a stable, self-contained building (here: an assessment of the overlap in a paper) has been constructed, and the scaffolding is no longer needed and can be dismantled (Orlikowski, 2006). Looking only at this final stage of plagiarism assessment, one might conclude that the software has only very little influence on editorial practice, and that editors are indeed quite autonomous in the way they address issues of text overlap. Yet this stage would not be possible without the preceding two stages, in which the software exerts considerable influence. Just as the Similarity Score displayed in the EMS dashboard serves as a starting point for critical reassessment and reassembly in the Similarity Report, those reassembled results in the Similarity Report serve as starting points for further rewriting of the texts in question. Effects of this scaffolding by the software thus carry over into the third stage, even though the scaffold is often no longer present.
Scaffolding the task of plagiarism detection
So far, we have seen how the software scaffolds specific decision spaces for plagiarism assessment in the three different stages. In addition, the software scaffolds a decision space for editors on an even more encompassing level, by introducing plagiarism as a problem into editorial work in general. Through its integration into the EMS, the software becomes an element that is routinely displayed (though not necessarily routinely used) in editors’ digital workflows. As such, editors are reminded of plagiarism as an issue that somehow warrants their attention with every manuscript they process. In previous studies, editors report varying levels of concern about plagiarism, corresponding to how appropriate they find the use of automatic screening (Hesselmann, 2023). In the present sample, editors express only moderate to low levels of concern about plagiarism, with some editors reporting that they have never encountered a ‘real’, that is, severe, case of plagiarism in their time as an editor. I’ve also got to say. We never had any cases, really never. So I have been doing this, or we have been doing this for 12 years. In those 12 years we never, it never happened that people, I don’t know, plagiarized countlessly from Wikipedia, or something, where we said: No, that is not right, that is simply copied. Or whatever. No, something like this does not, did never happen. No. No. (Academic Editor Social Sciences I)
For editors, then, plagiarism is mostly a problem that does not exist independently of the software, or at least not a problem that would be actionable to them on its own. This becomes most obvious when editors speak about how they would approach plagiarism without the software, or rather, how they would not do so: Interviewer: Maybe as a thought-experiment, how would it look like if you did this manually (Editor laughs), without the software? Editor: That would – I think in that case I would leave it. So it is just not possible. So I get something like, ten, fifteen papers a week or something. If I had to do that manually, I could not. (Academic Editor Engineering)
Yet, the screening tool still crucially shapes human possibilities for action and yields potentially wide-ranging power effects. It does not simply provide a solution to a pre-existing problem. Rather, problems and solutions are generated in close interaction with each other: with its emphasis on quantitative measurement of verbatim plagiarism through the automatic processing of large volumes of texts, the software provides a specific view on plagiarism that is geared towards technical solvability, especially solvability through digital automation (see also Hesselmann, 2023). Editors’ impression that manual plagiarism detection would not be possible is also directly tied to this conceptualization of plagiarism detection, which relies heavily on high-speed processing of large quantities of data: Interviewer: How would you do these tasks manually if you didn’t have the software? Editor: [Laughs] I don’t think you could. I don’t think anyone could. I think, unless you’ve read every publication that has ever existed, including people's published theses in institutional repositories, I don’t think it would be remotely possible to do this. I think the only way that such things would come up would be, maybe, if you’d written your own paper. And you were very familiar with the text, and you happened to come across a paper that included text, so, gosh that sounds familiar. But aside from that, I don’t think anyone can do this manually. (Professional Editor Life Sciences II)
Discussion
Using the concept of scaffolding to analyse human–machine relations in algorithmic decision support systems enables us to rethink these relations beyond the narrow view that such systems reduce human agency or discretion. Instead, the analysis highlights how the software's effects lie in its scaffolding of a specific decision space. The case illustrates that screening software does not replace editorial tasks or automate decisions, but rather enables human decision-making and discretion. Without the software, plagiarism detection is not an activity for editors. Through this generative quality (Orlikowski, 2006), decision support software does not diminish human discretion, but rather creates a space where such discretion becomes possible. Editors report that without the software, they would not check submissions for plagiarism, and thus would not make decisions about plagiarism at all. As such, whatever limited discretion the software affords them is still more discretion than they would have without it. Editors’ agency to engage with the issue of plagiarism is then largely created by the software rather than taken away by it. This view highlights the software's non-deterministic quality: allowing editors to use external resources, adjust thresholds and develop idiosyncratic use practices. They can even expand the decision space offered by the software when working with authors to rewrite texts. Thus, at least in the present case, human involvement is far from rubber-stamping, with human editors very much making individual decisions about submissions rather than being forced to accept automatically generated ‘suggestions’.
However, this does not render the software benign or inconsequential. Analysed through the metaphor of the scaffold, we can describe its effects as the scaffolding of a specific decision space, which yields twofold power effects: on the one hand, establishing plagiarism as an issue for editorial decision-making is extremely powerful in itself. Scaffolding decision spaces is similar to agenda setting, though arguably even more impactful: the software establishes plagiarism as a decidable and thus actionable threat to publishing integrity, that is, as a threat to which editors can and consequently should respond. In doing so, it intervenes in the moral economy of academic publishing. It also successfully redefines the roles and responsibilities of editors, who are now not only the gatekeepers of academic quality, but who also become guardians of good publishing practice. In this new role, editors enter into a kind of adversarial relationship with authors that carries traces of a generalized suspicion of wrongdoing.
On the other hand, while scaffolding enables human discretion, it does so in a very particular way, including some possibilities for action and excluding others. Scaffolding shapes human decision-making by enabling human discretion not as endless and amorphous possibilities, but as very specific activities and processes of interpretation. In the present case, the software establishes a particular understanding of plagiarism that relies heavily on quantification and verbatim language processing, thereby obscuring alternative conceptualizations. Moreover, the different types of scaffolding in the different stages also illustrate how the specifics of the software and its interfaces create different boundaries on human discretion. While scaffolding such as the Similarity Index enables only a relatively narrow corridor of human decision-making, the much broader scaffolding of the Similarity Report also offers a lot more options for editorial interpretation. As such, the screening software must be considered extremely powerful. Yet, its power lies not in automating decisions directly or indirectly, thus diminishing human discretion, but rather in creating decidable problems for human users.
Conclusion
Focusing on the generative quality of algorithmic decision support systems opens new perspectives for critique. First and foremost, this notion urges us to pay critical attention to the type of decisions, agency and potential for governance that these systems enable in their human users, additionally to how they restrict users’ discretion or the deployment of human expertise. These systems seem problematic because they establish new domains, new issues and new problems for humans to govern and control, not solely because they threaten human potential for governance. Saying that plagiarism screening software affords editors discretion and agency that they would not have otherwise is not the same as saying that such software is good, warranted, desirable or improves academic publishing. Such a focus on how software scaffolds decision spaces also opens up a normative view on human–machine relations that does not assume that human involvement will as a matter of principle make a sociotechnical system more fair or normatively desirable. In addition to worrying about algorithms making our decisions for us we should also be worried about what kinds of decisions we are enabled (and even required) to make in contexts structured by algorithmic tools. This includes asking what kinds of choices and options we are given, as well as what kinds of issues are configured as decidable and in need of our attention, and what roles we assume in making those decisions. In the present case, for example, that would mean that rather than being satisfied with calling for a human-in-the-loop in plagiarism detection, we should ask how editors’ roles are changed by becoming research integrity investigators, whether or not we think widespread control of academic writing practices is necessary and warranted, and what effects the governmentalization of publishing integrity has on the academic community as a whole.
Footnotes
Acknowledgments
The author would like to thank Rocio Fonseca, Franziska Fenz and Tabea Kawczynski for their help with data management and preparation of this manuscript, as well as the members of the Robert Merton Center Colloquium Berlin and the members of the RQT - Research Quality Theory Reading Group Nijmegen for helpful comments on earlier versions of this manuscript. The author acknowledges the use of DeepL Write in the final language edit of this manuscript.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the Bundesministerium für Bildung und Forschung (grant number 16PH20008).
