Categorizing methods and approaches for generating and identifying paradata

Abstract

Documenting the processes and practices of making and processing research data has been identified as key prerequisite of data reusability and intelligibility. A large number of methods and approaches for generating and identifying such information have been proposed, however, dispersed across the literature. Consequently, the current understanding of what types of approaches have been envisioned, how they differ and relate to each other, and what kind of paradata they produce is limited. This paper reports an initial study to increase understanding of the methods landscape through review and categorization of paradata generation and identification methods. We identified three major temporal categories of (1) prospective, (2) in situ, and (3) retrospective methods and approaches, and five categories of paradata artifacts generated: (1) structured metadata, (2) narratives, (3) snapshots, (4) diagrammatic representations, and (5) standard procedures.

Keywords

Documentation methods paradata practices processes

Introduction

Recent literature across disciplines, including information studies (e.g. Börjesson et al., 2020; Dahlström and Hansson, 2019; Huvila, 2021, 2022; Sköld et al., 2022), health (e.g. Savai et al., 2022; Scherr et al., 2021) and computer sciences (e.g. Seedat et al., 2024), biomedicine (e.g. Schröder et al., 2022), archeology, and cultural heritage (e.g. Beacham, 2011; Bentkowska-Kafel et al., 2012; Gant and Reilly, 2017; Huggett, 2020; Lo Turco et al., 2019) has put increasing emphasis on the significance of documenting the making and processing of data and research outputs to improve their reusability and intelligibility (Faniel et al., 2019; Huvila, 2022). Terming such information paradata is increasingly common, although multiple other overlapping and quasi-synonymous labels—for example, provenance metadata, process information and process metadata—are used as well (Sköld et al., 2022). In this study, paradata are understood as data on scholarly data creation, processing and (re)use (cf. Huvila, 2022). A plethora of methods and approaches for generating and identifying such information have been proposed. This work is, however, dispersed across the literature, and currently the understanding of what types of approaches have been envisioned, how they differ and relate to each other, and what kind of paradata they produce is limited. Currently, even the conceptual understanding of what eventually qualifies as such a method is limited.

From this outset, rather than aiming at a systematic review of all methods and approaches proposed for generating paradata-like information at this stage, the aim of this study is to engage in groundwork and develop an elementary conceptual framework for identifying and categorizing such methods and approaches. Based on a selective cross-disciplinary review of the literature, our objective is to investigate and identify key facets of methods proposed for generating and identifying paradata, including comparable information termed otherwise.

The aim of this paper is to create new knowledge of existing approaches for paradata capture and identification, their similarities and differences, and in a broader scale, strategies to improve the understanding of data making, processing, and use.

In this study, two research questions have guided the work:

RQ1: What categories can be identified among methods and approaches proposed for generating and identifying paradata and comparable information?

RQ2: What categories of paradata artifacts do the methods and approaches engage with?

The identified categories are theorized using the notions of boundary objects (Star, 1989) and boundary work (Gieryn, 1983) to expound further their modes of operation. This study uses the concept methods and approaches to signify diverse methods, approaches, research designs, and strategies applicable in a broad sense for identifying, generating, and/or collecting (para)data. Similarly to how paradata is used to refer to a category in an inclusive sense beyond information explicitly termed as such, also methods and approaches is used in analytical sense to refer to the broad set of techniques capable of advancing the goals of documenting and understanding processes and practices of data creation, processing and (re)use.

Previous research

In the following sections, earlier research on the purposes and uses of paradata and methods for identifying, generating, and capturing paradata and paradata-like information will be reviewed.

Paradata and paradata use

As Börjesson et al. (2022a) point out, there is an increased demand for more knowledge about data creation than what has been documented in the metadata at hand. Without proper documentation about the research design and the processes through which findings and research data have emerged, our ability to assess their applicability is severely limited (cf. Faniel et al., 2019). Such information, frequently termed paradata, has been argued to be crucial for ensuring the shareability and reusability of research data, reproducibility of research, contextual understanding of disciplinary differences in research work, and understanding scholarly knowledge production (Huvila, 2022). Paradata and process documentation can also help to verify results (Miksa et al., 2014), increase the reliability of the research findings and make them more robust, simplify research evaluation processes, support communication between users and producers of research data and allow future researchers to redo scholarly work processes. Paradata are crucial, especially in enabling cross-disciplinary research where implicit understanding of research processes cannot substitute meticulous documentation (Sköld et al., 2022). In addition, it has been suggested that paradata can be used to contribute to inclusiveness in qualitative research by using it to communicate research participants how the data collected together with them was analyzed and used as a basis for creating new knowledge (Rainey et al., 2022). Many of the benefits boil down to their capacity to improve transparency, which is regularly put forward in the literature as a key benefit of paradata (e.g. Bentkowska-Kafel et al., 2012; Mudge, 2012; Rainey et al., 2022; Sköld et al., 2022; Turner, 2012).

Paradata is best described as an emerging concept. So far, the notion has been discussed most in survey research (Kunz et al., 2020), the preservation and visualization of cultural heritage (e.g. Denard, 2014), archeology and research data (e.g. Huvila et al., 2021), and archives and records management (Davet et al., 2023). Börjesson et al. (2022a) identify four categories of paradata in an interview study with researchers working with archeological data. These categories are scope (coverage of data), provenance (origins), methods (contexts and methods of data generation), and knowledge organization and representation of paradata (how data are structured, represented, and communicated). In an investigation of opportunities of extracting paradata from research datasets, two related categories of knowledge-making paradata (describing data gathering and analytical processes) and knowledge organization paradata (how empirical observations are transformed into data units) were distinguished (Börjesson et al., 2022b). In another study that analyzes archeological research reports, Huvila et al. (2021) identify paradata in the form of procedural narratives, description of methods and tools, actors, photographs, citations, and descriptions of research outcomes.

Earlier research has emphasized the close link between paradata and metadata (Börjesson et al., 2020; Gant and Reilly, 2017; Lake, 2012; Sköld et al., 2022). However, important distinctions are made between the two concepts (cf. Richards-Rissetto and Landau, 2019). Contrary to the widely used and somewhat simplistic notion of metadata as “data about data” (cf. Pomerantz, 2015), that is, information describing data, paradata appear more immaterial (cf. Gant and Reilly, 2017). It is relational and depends on how it is used (Cameron et al., 2023). In contrast to metadata, paradata has also a different emphasis (Davet et al., 2023) on describing processes rather than objects (Huvila, 2022). In survey research, paradata “are data about the data collection process, such as survey timings, locations, and response rates” (Choumert-Nkolo et al., 2019: 600) while, in other fields, it is typically used to refer to processes of curation, management, and processing as well (Cameron et al., 2023; Sköld et al., 2022). Paradata are also akin to provenance information (cf. Mudge, 2012), and the concept shares some features with “provenance metadata” (see Gant and Reilly, 2017; Huvila, 2022; Missier, 2016), which, similarly to paradata, are acknowledged as useful for recreating earlier research. As suggested by Huvila (2022), provenance (meta)data describe the geneses of particular objects as well as the “context and processes related to the earlier life of data” (p. 31) whereas paradata tends to unfold as encompassing processes in a broader sense beyond curatorial and historical perspectives (Sköld et al., 2022).

Methods for generating and identifying paradata-like information

Archeology is an example of a field with a long tradition of explicit emphasis on documenting methodological processes using, for example, field notes and photographs to document sites (Gregory et al., 2019). Yet, understanding collection methods data can be difficult because of a lack of recording standards for such information. In multiple fields, standards exist that can consist of procedural guidelines for how to conduct data collection and research work (e.g. Gruca et al., 2014; Zass et al., 2023), specifications on how to document processes (e.g. Chinosi and Trombetta, 2012; Deelman et al., 2018), or both. In addition, data collection is sometimes stipulated in broader policy documents, sets of principles and charters (e.g. Beacham, 2011; Denard, 2012, 2014) that do not incorporate checklists for specific tasks (Denard, 2012) but provide general guidelines for developing project-specific documentation. However, despite the acknowledged importance of process documentation, Miksa et al. (2014) argue that most data management practices are focused on data and seldom consider in detail how they were generated or analyzed.

As Koesten et al. (2019) point out, the relevant information on the processes of data creation “can take many forms and includes text descriptions, annotations, metadata, previews and categories” (p. 5). Early on, automatic generation of meta- and paradata were described as a major benefit of computer-assisted data collection (e.g. Couper, 2000). In addition to intentionally collected documentation, much automatically captured data carry fingerprints and log data that provide information about data creation and use (cf. Huvila, 2022). Huvila (2022) exemplify this by referring to different meta and/or provenance data generated by, for example, the use of GPS (cf. Choumert-Nkolo et al., 2019) and 3D modeling software (cf. Champion and Rahaman, 2019; Huurdeman and Piccoli, 2021). Besides automatic means, process documentation is conventionally generated through diverse manual processes including manual notetaking, diary, and report writing. It has also been noted that process documentation can also be generated retrospectively using forensic methods to “excavate” available resources to reconstruct data collection and generation processes or parts of processes (Huvila, 2022). As a whole, in spite of their different forms and shifting contexts, it is possible to discern common traits and typifiers among the diverse methods. While such categorization has not been attempted before, we argue that it is doable and helpful for increasing the understanding of both the methods and approaches, and the resulting paradata.

Theory: Boundary objects and boundary work

As exemplified by the literature review in the previous section, data documentation and methods for generating and collecting information rarely serve only one specific purpose. Instead, the approaches are complex arrangements made up of multiple activities serving different purposes in various stages of the data continuum, that is, when data are created, processed, and used during their lifetime. They also work simultaneously on different temporal levels. Workflows used both for prospective and documentary purposes (Yan et al., 2020) are one example discussed in more detail later in this text. Individual methods can also be part of a broader methodological process put together to achieve multiple overarching goals across different stakeholder communities. This means that the individual activities constituting a method and the methods themselves are linked together in complex arrangements that can be understood to potentially function as what Star termed boundary objects (cf. Bowker and Star, 1999; Star, 1989). Boundary objects “are objects that are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use” (Star, 1989: 46). Similarly to how boundary objects, according to Star (1989), are products of different time horizons, they can also arise when concrete and abstract representations of the same data are joined (Star, 1989). Boundary objects emerge over time through the collaboration of robust communities of practice (Bowker and Star, 1999).

The notion of boundary work, coined by Gieryn (1983), is a theoretical relative of boundary object that refers to the pursuit to demarcate, consolidate, and revise boundaries between contexts. The linkages and complementarities of the antagonistic two concepts have been discussed in several texts observing both how the subject of boundary work can turn to a boundary object (Houf, 2020) and how boundary objects can participate in boundary work (Meloni, 2016).

Similar to the artifacts discussed by Star, the methods explored in this study and their outputs, that is, paradata, traverse multiple communities of data creators, and users evolving continuously in the process. The methods and approaches, their purpose and the generated paradata can all be used and interpreted differently in each of the communities that are using them. Therefore, in this study, the two concepts have been applied to theorize paradata and the methods to understand and explain the overlap between the categories of facets characterizing the reviewed methods. Also, conceptualizing paradata and methods as relating to boundaries and boundary crossings manifests their temporal and community-traversing contingencies, and the mechanisms of how and when particular types of information become informative of data creation, processing, and use. Whereas different forms of paradata generated using diverse methods are framed as information authored to function (cf. Huvila, 2019) as boundary object(s), we conceptualize the work, that is, the methods, that create them as boundary work of demarcating the documented processes as specific types of undertakings different from others.

Material and method

The body of material reviewed in this study is composed of a selection of published primarily peer-reviewed research papers written in languages known to the research group (incl. English, Nordic languages, French, and German) describing diverse methods and approaches applicable for generating, identifying and capturing paradata and paradata-like information through documentation and analysis of processes and practices and their related paraphernalia. Rather than focusing necessarily only on proven and widely-used methods, we were explicitly seeking to find a diversity of potentially useful approaches and ideas. The analyzed texts were not expected to explicitly use the notion of paradata as long as the described methods were found relevant for generating, identifying and capturing paradata-like information. Such information was described in the material using varying terms including provenance (metadata), process information and metadata (cf., Sköld et al., 2022). The papers included consist of both conceptual and empirical texts within multiple disciplines ranging from social sciences and humanities to sciences, technology and health research. Archeology-related literature is admittedly over-represented due to the empirical focus of a larger project within which the currently reported work was conducted. However, archeology, more than many other fields, relies on highly transdisciplinary data and incorporate diverse practices for collecting and interpreting them, which makes it a particularly well-suited context for exploring the complexities of data documentation and (re)use.

The main focus of the analysis was on discerning types of paradata artifacts described and the key descriptive methodological features of the reviewed approaches. When conducting the categorization, a preliminary coding scheme was established after a pre-review of a total of 60 papers that the research team had identified as being especially relevant. They were selected from a much larger number of papers informally reviewed during a large-scale research project that has been going on since mid-2019. Sampling relevant texts was based on an iterative heuristic process of theoretical sampling that started with methods previously proposed or described as being used in the context of paradata (e.g. recording, producing narratives, see Table 2) in the literature. Literature was identified through literature searching in course of cross-disciplinary multi-year research work on paradata using bibliographical databases and major repositories of scholarly and scientific literature. The process continued by complementing the list with examples of functionally comparable and related approaches (e.g. workflows and trace analysis, see Tables 1 and 3). Finally, as the research project processed, empirical findings, and conceptual work to develop a theory of paradata concept led to identifying additional techniques (incl. chaîne opératoire and participation, see Tables 2 and 3) and their related paradata artifact types with potential to help identifying or generating paradata.

Table 1.

Major categories of types of paradata enacted through reviewed methods and approaches.

Categories of paradata outputs	Description
Structured metadata	Structured, formal and/or standardized metadata describing data-related processes.
Narratives	Narrative, textual or audio/visual, descriptions of data-related processes.
Snapshots	Momentary observations, notes and, e.g. visual recordings of moments in data-related processes.
Diagrammatic representations	Diagrammatic, typically schematic representations of data-related processes.
Standard procedures	Fixed and standardized procedural descriptions of data-related processes.

Table 2.

Clusters of prospective approaches of paradata generation.

Categories of prospective methods	Description	Examples
Workflow-based approaches	Approaches that stepwise describe how human-performed and, to different degree, automated processes and practices should be enacted (in general) in order to achieve a certain goal.	Executable workflows (e.g. computational, incl. computer code; cf. Fredrikzon, 2021; Videla, 2021).
		Workflow protocols (Carboni et al., 2016; e.g. Fafalios et al., 2023; Maryl et al., 2020; Mccarthy et al., 2020; Zabulis et al., 2021)
		Quasi-workflows (e.g. guidelines and handbooks, incl. instructions and simplified descriptions of procedures (e.g. Huvila and Sköld, 2023; Llebot and Van Tuyl, 2019).
Plans	Structured descriptions of activities that need to be followed to reach a particular outcome.	Registered reports (Nosek and Lakens, 2014)
Plans		Research plans and data management plans (cf. Kvale and Pharo, 2021).
Framework-based approaches	Approaches that describe how to document processes and practices.	Conceptual frameworks and reference models (how things are to be linked to each other; e.g. Extended Matrix in Demetrescu et al., 2023; Gardin, 1999; Palmer et al., 2017)
		Structured information standards describing what information on processes to include (e.g. Beretta, 2024; Hackos, 2016)
		Controlled vocabularies and label sets (what words/expressions/notations to use; cf. e.g. Chao et al., 2015; Smith, 2021)

Table 3.

Clusters of in situ approaches for paradata generation.

Categories of in situ methods	Description	Examples
Models of processes and practices	Methods for modeling processes	Formal modeling protocols (e.g. Zabulis et al., 2021)
		Diagrammatic models (i.e. simplifications) aiming to provide a comprehensive documentation of data and documentation processes, incl. diagrams and diagrammatic visualizations, e.g. process maps, provenance graphs, etc. (e.g. Lee et al., 2018)
		Digital twins (e.g. Blair, 2021)
		Structured metainformation (cf. e.g. Hughes et al., 2015; Methods and Data Comparability Board, 2002; Shoilee et al., 2023)
Narratives	Narrative descriptions of processes	Narrating data and narrating with data (e.g. Dourish and Gómez Cruz, 2018); thick description (e.g. Hann, 2021); methods chapters (e.g. Chapter 4 in Smith, 2020), data fiction (e.g. Dourish and Gómez Cruz, 2018), data storytelling (e.g. Buchanan, 2016; Dykes, 2019; Knaflic, 2020); data stories (e.g. Mosconi et al., 2023); data comics (Alamalhodaei et al., 2020)
Annotations and colophons	Annotation and comments with information to support the reader’s understanding of processes	Annotations and colophons (Light and Hyry, 2002); curation of digital scholarly editions (cf., Van Mierlo, 2022); “Ethnography of datasets” (Poirier, 2020)
Recordings	Methods for generating recordings of processes	Video, audio, 3D recording (e.g. Derudas, 2021; Sant, 2017)
		Photographs (Dorrell, 1994; O’Connor and Goodwin, 2017; Reilly et al., 2021; Sant, 2017)
		Logs and notebooks, incl. lab notebooks and, e.g. Jupyter notebooks (e.g. Couper, 2000; Jiang et al., 2014; Yan et al., 2020); Reflective journals (e.g. Banfi, 2022); diaries (e.g. Sant, 2017)
Participation	Co-creating and experiencing processes with users	Living labs (cf. Ruijer, 2021); participatory data work (Miceli et al., 2022a)

The identified paradata artifact types were grouped to major categories in an iterative process of identifying artifacts and their common characteristics. For a closer analysis of methods, a spreadsheet with columns representing various facets of the reviewed methods was set up and used to facilitate the work. During the review, based on the earlier observation that some of the approaches for paradata generation were contemporary or quasi-contemporary to research data creation and other post hoc approaches (Huvila, 2022), the methods were first coded according to their temporal scope (information or documentation created prior, during or after data collection). After several rounds of discussion and coding, the temporal facets were established as a basis for the categorization scheme. After that, sub-facets describing the different types of work processes within the temporal categories were identified. As several methods demonstrate similarities across facets, the facets were arranged in different levels of categorization and organized in a hierarchy. When methods were considered to belong to multiple categories, they were highlighted and sorted into multiple columns before generating the final categorization reported later in this text.

Results

The following reporting of results consists of two sections. The first section identifies five major categories of paradata artifacts generated through the enactment of the methods whereas the second part describes three overarching categories of methods based on the temporal scope of paradata generation with subcategories.

Paradata artifacts

In the reviewed papers, we identified five broad categories of paradata artifacts created through the use of the reviewed methods and approaches (Table 4) including: (1) structured metadata, (2) narratives, (3) snapshots, (4) diagrammatic representations, and (5) standard procedures.

Table 4.

Clusters of retrospective approaches for paradata elicitation.

Categories of retrospective methods	Description	Examples
Backtracking	Identifying chains of activities described in the data	Chaîne opératoire (Rösch, 2021)
Backtracking	Identifying chains of activities described in the data	Rule-based approaches (Migliorini et al., 2022)
Trace analysis	Analysis of traces of processes in research outputs	Diplomatics (Duranti, 1998, 2009)
		Data mining (e.g. NLP and AI methods; Richards et al., 2015)
		Diffractive, phygital art/archeology approach (Callery et al., 2021; Dawson and Reilly, 2019)
		Identifying inscriptions, i.e. how scholars write things down, both within the data and in the literature, and identifying broader changes in scholarly practices (e.g. Holmberg and Hjørungdal, 2015; Ma and Li, 2022)
Analysis of paratexts	Methods for analyzing contextual information on creation processes	Trace ethnography (Geiger and Ribes, 2011 e.g. in Thomer and Wickett, 2020)
Analysis of paratexts		Marginalia work (Spedding and Tankard, 2021)

Some of the discussed methods, with examples in all three temporal categories, generate structured metadata. Several methods vocabularies and ontologies exist for different domains (cf. e.g. Doerr et al., 2007; Hughes et al., 2015; Methods and Data Comparability Board, 2002). One of the major criticisms of formal metadata relates to its univocality. Shoilee et al. (2023) propose an approach to represent polyvocal structured provenance information to address this limitation.

Other approaches focus on generating narratives of processes and practices. Examples of such methods include textual data stories (Mosconi et al., 2023) and narratives (e.g. Dourish and Gómez Cruz, 2018; Huvila and Sinnamon, 2022; Khazraee, 2019), recorded descriptions such as video diaries (e.g. Berggren et al., 2015; Brill, 2000), and hybrid narrative descriptions, including, e.g. data comics (Alamalhodaei et al., 2020). Such methods are typical, especially in in-situ documentation, rare in prospective approaches, and extant, albeit unusual, in retrospective documentation in contrast to narratives that are generated retrospectively from in situ documentation. A likely explanation, even if contrary to evidence from narrative research (Gergen and Gergen, 2008) could be that narratives are necessarily not experienced as useful for prescriptive purposes as they are considered in documenting processes.

A third category of paradata outputs consists of information that can be described as snapshots. This includes photography (e.g. Dorrell, 1994; O’Connor and Goodwin, 2017; Reilly et al., 2021; Sant, 2017) but also a plethora of discrete observations, inscriptions and traces (e.g. Ma and Ma and Li, 2022), metainformation (e.g. Gehani et al., 2021; Malik et al., 2010), and notes typically captured in situ that are not organized enough to qualify as structured metadata, or form a clear narrative.

A further group of approaches generate paradata in the form of diagrammatic representations, that is, simplified explanatory visuals. Such approaches include generation of visual workflow diagrams (e.g. Acuña et al., 2012; Oberbichler et al., 2022; Post and Chassanoff, 2021) and knowledge graphs (e.g. Fabre et al., 2022) often either prospectively or retrospectively. Rather than being the ultimate product, diagrammatic representations might complement structured paradata, or are literally functioning as visualizations.

Finally, a category of artifacts or outputs identified in the review are best described as standard procedures. Rather than generating diagrams or descriptions as their ultimate output, especially prospective methods have a tendency to at least implicitly aim at establishing a fixed modus operandi. Much of the workflow literature falls under this category including both the prospective and documentary scientific and scholarly workflows (e.g. Ince et al., 2022; Oberbichler et al., 2022).

In terms of their functioning as boundary objects, the different categories of paradata artifacts express diverse degrees of plasticity and robustness (cf. Star, 1989) enacted through diverse mechanisms. For structured metadata, diagrammatic representations and standard procedures, plasticity unfolds through the standardization of their form (cf. Star and Griesemer, 1989) whereas for narratives and snapshots, it stems from their open-endedness. All identified paradata artifacts are also weakly structured (cf. Star, 1989) in some sense. Open-ended narratives and snapshots often lack standardized form whereas structured metadata, diagrammatic representations and standard procedures are, while having rigid internal structure, weakly linked to the processes they document.

Methods

Besides the five categories of paradata artifacts, we identified the following three categories of methods based on their different time horizons (cf. Star, 1989) in the reviewed literature depending on when the paradata generation or capturing takes place in relation to the activity it describes:

Prospective methods

In situ methods

Retrospective methods

The methods categorized as prospective are directed toward the future, that is, they refer to approaches of working or to creating templates for data generation before the actual data creation takes place. The in situ methods, on the other hand, refer to paradata creation at the time of data generation on ongoing processes. Finally, the retrospective methods are about generating paradata on activities that have already taken place in the past.

In the following, each of these three categories will be discussed in detail and examples of methods from each category will be described and discussed.

Prospective methods

Prospective methods are methods that document and envision future practices. In this category, the methods engage in the boundary work of demarcating paradata generation generally as the responsibility of standard setting bodies and those responsible for the data collection (see e.g. Beretta, 2024; Nosek and Lakens, 2014). Comparably, paradata as a boundary object is authored ex ante to convey a predisposed framing of processes and practices. There are a number of different types of methods that can be categorized as being prospective. In this study, three major clusters (Table 1) were identified based on how the methods conceptually approach prospective practice: work flow-based approaches, plans, and framework-based approaches.

In the literature, diagrammatic, systems-based and workflow-based approaches make up a predominant category of prospective methods used to conceptualize and represent process information. Workflows aim typically to provide precise descriptions of procedures and are generally conceived as series of stepwise tasks leading to the accomplishment of a specific undertaking (Goble et al., 2020). As boundary objects, they embody standards of process modeling to represent activities as particular types of formal chains of actions. Workflows can be automated and composed of computational or technical tasks. Also, they can be semi-automated or manual and consist of human tasks (Polančič, 2020). A computational workflow refers to a workflow comprising computational tasks (Goble et al., 2020). The literature also distinguishes between abstract workflows and workflow models and plans from an operational executable workflow, which is “an instantiation of the workflow plan” that “contains the required recipes to launch the workflow” (Li et al., 2023: 204). Workflows are often represented visually for their users by, for example, using process maps or flowcharts (e.g. Locatelli et al., 2010; Schwandt, 2022) and increasingly as algorithms, code, and pseudocode (e.g. Andresen, 2020; Videla, 2021).

In scientific research and data management, the workflow thinking has been operationalized as scientific workflows that are applied especially in data-intensive large-scale scientific research (Ludäscher et al., 2006). A comparable concept for humanities and social sciences is the scholarly workflow (Antonijević et al., 2020). In the latter context, workflow thinking has been applied especially in digital research to describe how to collect, find, organize, and store research data that involve a certain level of bricolage combining of available tools and resources (cf. Antonijević et al., 2020). Workflow literature does, however, acknowledge the socio-technical nature of processes with some approaches putting specific emphasis on including both physical components (e.g. physical reading material) and online resources (e.g. digital databases, citation managers, etc.) used in scholarly work (Ince et al., 2022). One context where process transparency has been found particularly pivotal is interdisciplinary research where workflows have been proposed to support crossdisciplinarity investigations of common research problems by, for example, generating and identifying new relevant keywords for digitized data (cf. Oberbichler et al., 2022). Workflow-based approaches have also been proposed for facilitating data curation, for example, of born-digital documents (cf. Post and Chassanoff, 2021).

Besides executable and abstract representations of actual and planned workflows, the literature describes diverse prospective procedural approaches that can be termed quasi-workflows. Scientific knowledge graphs that “create interaction of nodes that explore information spaces representing research results” (Fabre et al., 2022: 1) are not comprehensive representations of complete research workflows as they do not allow comparisons of paths leading to different outcomes (Fabre et al., 2022) and hence limited in providing an exhaustive representation of a process. Other examples classifiable as quasi-workflows are to varying degrees comprehensive step-by-step instructions included in guideline documents and instruction manuals (e.g. Huvila and Sköld, 2023; Llebot and Van Tuyl, 2019).

With plans we refer to a category akin to workflows that prescribe planned practices, however, generally without the aim of proving a precise step-by-step walkthrough of a specific process (Goble et al., 2020). Rather than strict standards, they draw from broader conventions of practice. Examples of artifacts categorizable as plans include registered reports, that is, preregistered and reviewed documents describing a planned data collection process (Nosek and Lakens, 2014), scenarios, (e.g. Borglund and Öberg, 2018), research plans, and data management plans (cf. Kvale and Pharo, 2021) that typically describe planned research procedures in a broader sense. As Donnelly (2012) notes, plan is not a guarantee of a desired outcome—echoing Suchman’s (2007) findings on how a situated action unfolds—but an instrument to help to anticipate and prepare for eventual risks and to communicate preparations, agreements and envisioned actions.

The category of framework-based approaches refers to prospective schemes that provide means to describe and steer processes. While workflows provide task-by-task representations, frameworks usually only lay out premises and a general outline to complete a given task. The boundary objects generated by such approaches tend to be even more malleable than plans, and the boundary work the approaches engage in far less formal than with workflows. In a literal sense the concept of framework is a vaguely defined concept (cf. Cox et al., 2016; Partelow, 2023). Although frameworks often provide the foundation for how to act in a situation, it is not always clear how they have been developed and, as Partelow (2023) points out, there is “often a ‘black box’ nature to frameworks” (p. 2). At the same time, however, frameworks can be useful in describing a set of assumptions, guidelines and values on which methodological practices can be built (cf. Binder et al., 2013; Partelow, 2023). They are more flexible than workflows, less sensitive to changes and applicable to a broader variety of contexts. Examples of prospective framework-based approaches include codebooks (Niu and Hedstrom, 2008) and the DATABOOK framework proposed by Nesvijevskaia (2021). Other approaches categorizable as framework-based approaches are diverse conceptual frameworks and reference models usable for describing practices and processes, for instance, CIDOC-CRM combined with extensions such as, and CRMinf to document argumentation and scientific observation, measurements and processed data (Doerr and Theodoridou, 2011; Stead and Doerr, 2015). Further, also structured information standards that stipulate what information on processes to include (e.g. Hackos, 2016), and process-related controlled vocabularies and label sets (Rodrigues and Teixeira Lopes, 2022) can be construed as frameworks in how they literally provide a substructure for describing processes.

A typical aim of all prospective approaches is to improve the efficiency, precision and standardization of (work)processes, including documentation of data collection and management procedures (Palmer et al., 2017; Ruijer, 2021; Yan et al., 2020), and, for example, to be able to write more precise computer code (Videla, 2021). Another common aim is to improve data quality and interoperability, for example, by prescribing how to make parameters for data formats, and metadata available for future use by articulating more relevant keywords (Oberbichler et al., 2022). From the perspective of process documentation, the major promise of prospective approaches is in their potential to function as precise-enough descriptions of processes to an extent that makes them reproducible (Ludäscher et al., 2015), however, with the well-known caveat that for varying reasons, people do not necessarily comply to predetermined procedures (Dekker, 2003).

In situ methods

The category of in situ approaches document and generate paradata on activities and practices at the moment when they take place, that is, of ongoing processes. The boundary work associated with such approaches has necessarily a makeshift character even if it is guided by a preunderstanding of what is happening and projection of the how the generated paradata eventually could be used. Generated boundary objects vary from formal to highly informal. While many texts did not explicitly address the issue, an implicit assumption in many in-situ methods appeared to be that data creator and processor are also responsible for the adequacy of primary documentation (e.g. Shoilee et al., 2023; exceptions e.g. in Sant, 2017), however, sometimes in collaboration with data specialists (e.g. Hrynick et al., 2023; Miceli et al., 2022a; Mosconi et al., 2023). We identified five categories of in-situ methods in the literature: (1) Models of processes and practices, (2) Narratives, (3) Annotations and colophons, (4) Recordings, and (5) Participation (Table 2).

Models of processes and practices refer to a category of in-situ methods to generate a model of representation of a process. Formal in situ modeling protocols aim to providing a formal representation of a process of practice. For example, Zabulis et al. (2022) propose a protocol for documenting crafts practices. There are also a lot of examples of approaches that rely on diagrammatic modeling of producing graphs and maps (e.g. Lee et al., 2018). In addition, also digital twins (e.g. Blair, 2021) and structured metainformation models are approaches based on producing a model of a process. Similarly to prospective workflows, while the formality of models varies, as boundary objects they are based on standards that are to different degrees explicated and visible.

Narratives are, perhaps in practice, the most popular category of in-situ approaches to generate paradata on ongoing processes. Narrativising is an open-ended approach to boundary work of framing a practice or process. A broad variety of different types and styles of narratives that can be expected to function to varying degrees as boundary objects depending on how they resonate with relevant communities (cf. Bartel and Garud, 2009) have been proposed for process and data documentation ranging from thick description (Hann, 2021), extended methods sections in journal articles and monographs (e.g. Chapter 4 in Smith, 2020) to diverse forms of creative writing (Dourish and Gómez Cruz, 2018; Mosconi et al., 2023), storytelling (Buchanan, 2023; Dykes, 2019; Knaflic, 2020), and data comics (Alamalhodaei et al., 2020).

An obvious in situ method for paradata generation is through creating recordings of processes. A classic research documentation method in both laboratory and field sciences is to record decisions and practices in notebooks and diaries using both text and illustrations (Canfield et al., 2011; Holmes, 1990; Mickel, 2015). While recordings sometimes remind of narratives in their functioning as boundary objects, the premises of collecting-oriented recording as a form of boundary work differs from the constructive narrativising. The focus and comprehensiveness of such recordings vary from capturing personal reflections to more comprehensive process documentation (Banfi, 2022; Canfield et al., 2011; Mickel, 2015). The introduction of digital notebooks and diaries has provided new opportunities to enrich notes and facilitate note-taking, although as Sandoval (2021) notes, digitalization also risks to deprive diaries of some of their earlier affordances. Electronic diaries and applications like Jupyter notebooks have increased popularity especially in digital research (VandenBosch et al., 2023; Wofford et al., 2020).

Annotations and colophons refer to in situ documentation through adding explanatory notes and statements about data creation, processing and management. Even if annotating is as political and interventionist as any information practice (cf. Kalir and Garcia, 2021), as boundary work annotating is additive rather than that of generating new objects to act as boundary objects. Light and Hyry (2002) describe the use of both annotations and colophons in documenting archivists’ decisions and work on archival collections. Annotations are also a typical method proposed for adding paradata to heritage visualizations (e.g. Niccolucci, 2012; Turner, 2012). Examples of comprehensive annotative approaches that border on recordings include Poirier’s (2020) ethnography of datasets and digital scholarly editions (e.g.., Van Mierlo, 2022).

Another already venerable recording method is photography that has been used for scientific purposes already close to two centuries (McFadyen and Hicks, 2020; Mitman and Wilder, 2017). In terms of boundary work, photography comes close to recording and photographs recordings. In spite of being snapshots of processes, photographs can be valuable as documents of longer-term research work (Huvila et al., 2021; Locatelli et al., 2011). Today, research processes are also easy to record using video, audio, and 3D recording, and by capturing log data from digital tools and applications, for example, in archeological field documentation (e.g. Dell’Unto et al., 2017; Derudas, 2021; Powlesland, 2016; Zanini, 2012) and when recording performance art (Sant, 2017).

Finally, participation can also be conceptualized as a form of in-situ generation and transmission of process knowledge as part of the process from knowledgeable practitioners to learners. In participation the focus is on the boundary work of framing processes rather than generating predetermined types of paradata. Such approaches as living labs (Ruijer, 2021) and participatory data work (Miceli et al., 2022a) aim to capturing data “through the involvement of aware users in real-life settings” (Dell’Era and Landoni, 2014: 139) and passing on process knowledge from people to people by providing a space for knowledge transfer either without or with the help of codifying some of it. Miceli et al. (2022a) argue for employing a participatory approach to data work in order to make documentation more adaptable to the needs of different stakeholders. In their proposed approach the focus is on documenting data production processes and the collection and labeling of data on machine learning in such a way that the documentation “is able to capture the evolving character of datasets and the intricacies of data work” (Miceli et al., 2022a: 24).

The examples of in situ practices incorporate different incentives and ways of performing documentation work. In some of them, documentation is the responsibility of data producers (e.g. Morreale, 2022; Rösch, 2021), in others, dedicated data curators (e.g. Van Mierlo, 2022), whereas sometimes, it is framed as a participatory undertaking of multiple stakeholders (cf. Miceli et al., 2022a). Paradata generation also has different foci, including encounters between data and people, the data creators, or specific work settings. Cline (2022), for example, discusses documentation of archival encounters with the focus on understanding interpreters’ impact on interpretations. A similar approach could be applied when explicating the impact of data producers on the data they produce. Hughes et al. (1998) emphasize the importance of documenting workers, that is, the data creators, and work settings, that is, the context in which the creation takes place.

The outcomes of in situ documentation in terms of paradata and boundary objects take many different forms. Some are simple and others complex and focus on describing individual tasks (e.g. Vaz et al., 2019), or decisions (e.g. Alexeeva et al., 2016), tools (e.g. Hsieh et al., 2023), actors (e.g. Cline, 2022; Morreale, 2022), or narratives or representations of complete practices (e.g. Canfield et al., 2011; Mickel, 2015). An extreme form of documentation, that goes functionally beyond mere documenting, is digital twin (Blair, 2021). As digital representations, or ideally copies of a complete object or process, they are nominally expected to act as facsimiles of specific things, phenomena, processes, or practices, rather than to describe (Blair, 2021). Ideally a digital twin is an object being a boundary object of itself where the concept and concrete object come together.

Retrospective methods

Besides being produced ex ante or in situ, paradata can also be manifested in residues and outcomes of data-related practices and processes and be made available post hoc. The final category of methods and approaches covering such approaches for eliciting paradata after action can be termed retrospective methods. In retrospective paradata generation, the prerogative of paradata generation tends to be with data (re)users, and occasionally with data specialists. The methods that have been categorized into this group involve procedures of analyzing past processes to either recreate or trace information, typically from secondary resources. Among retrospective methods, we identify three subcategories: (1) Backtracking activities, (2) Trace analysis of data for processes, and (3) Analysis of paratexts (Table 3).

The first category consists of methods of backtracking chains of activities present in the data rather than in a dedicated data documentation. This can be done both in computerized environments through, for example, mining code and computational outputs, but also by tracing back non-computational activities in non-digital contexts. Rösch (2021) has applied the notion of chaîne opératoire (Delage, 2017; Leroi-Gourhan, 1964), adopted from ethnology and prehistoric archeology, for backward tracking of archeological knowledge production. The conventional aim of tracking the archeological chaîne opératoire is the documentation of material traces of past human presence. However, Rösch (2021) describes a version of the method designed to make process data accessible and easier to trace by combining chaîne opératoire with concepts from actor network theory and geographic information systems. Migliorini et al. (2022) propose another type of approach to backtracking processes based on the use of derivation rules to identify and describe data provenance in archeological field data, and Arshia et al. (2021) a computational backward chain rule based method that combines keyword extraction and similarity measurements of code segments for backtracking software development projects.

Sometimes, if structured paradata or comparable information are available, for example, in the form of provenance information documented according to specific data documentation standard, it is possible to conduct detailed forensic backtracking using computational tools. One example of such a tool is the open-source SPADE software designed for inferring, storing, and querying structured data provenance information (cf. Gehani et al., 2021). However, often, structured data that could function as paradata are not available.

The second category of retrospective approaches, trace analysis, refer to a broad set of methods with the focus on deriving process information from research outputs. Diplomatics, that is, the study of—in a broad sense—the formation process of individual archival records through the analysis of the form of documents to understand their function is an illustrative example of a comprehensive approach to what can be described as a form of trace analysis (Duranti, 2009). Digital diplomatics has broadened the scope of diplomatics to digital records. Foscarini (2012) has proposed genre analysis to complement diplomatics as a means to delve deeper into the intellectual formation of documents while Duranti (2009) sees opportunities to complement diplomatic analysis with digital forensics. Many forms of trace analysis focus on particular types of traces. Ma and Li (2022) inquire into traces of research production embodied in the non-verbal material artifacts and media to understand trends in scholarly work. Bates (1996) and, for example, Huvila et al. (2022) have suggested following quotations and citations to trace back scholarly practices. Callery et al. (2021), and Dawson and Reilly’s (2019) diffractive approach is based on using an assemblage of recording and presentation techniques, unconventionally and recursively to document and represent embodied paradata. Similarly to how a great variety of cues can function as traces, a comparably broad diversity of techniques can be used to retrieve and analyze traces, ranging from computational methods like data mining (e.g. Richards et al., 2015) to qualitative close reading of documentation (Huvila et al., 2023). The difficulty of managing and pooling traces has led to developing techniques (e.g. Gehani et al., 2021) to consolidate provenance information.

In contrast to analyzing data themselves, the third category identified consists of diverse methods focusing on the analysis of paratexts that shed light on data-related processes. Similarly to how Hodges (2021) has used trace ethnography (Geiger and Ribes, 2011) that capitalizes on traces and documents left behind in digital systems to understand the work of biomedical repair technicians, Thomer and Wickett (2020) have used the approach to study databases to understand scientific data practices. Perhaps the most obvious example of analyzing paratexts is, however, marginalia work (Spedding and Tankard, 2021) with what originally in literacy contexts referred to margin notes, highlights, underlining and dog ears and with datasets has extended engagements with an equally rich variety of notes and markings in digital data, and beyond, for example, in material traces and artifacts relating to the generation of data and other digital objects (cf. McDonald et al., 2021).

In contrast to prospective methods typically aiming to stipulate future processes and process documentation in situ methods with documentary purposes, the objectives of retrospective paradata elicitation are often geared toward improving the usability of data when data-related processes are deemed to be insufficiently documented for a required purpose (e.g. Berman, 2015). Boundaries are defined post hoc and boundary objects are identified among or constructed through residues of processes. Retrospective approaches can also be used for improving information retrieval (Ma and Li, 2022). Another reason for eliciting process information retrospectively is quality assessment and the establishment of liability. Earlier documentation is sometimes understood to be flawed or there are reasons to believe that a better understanding of the process can help to discern and manage bias (e.g. Ahuja et al., 2021; Börjesson et al., 2022b). In order to determine the source of both human-generated and machine-generated data sets and their trustworthiness, Vasudevan et al. (2016) propose a data-driven method for reconstructing provenance in cases where none has been recorded. The suggested approach is a multi-funneling method that integrates a combination of techniques including topic modeling and genetic modeling, statistical re-clustering and file clustering for determining the lineage of data.

Discussion

We have categorized methods and approaches applicable for generating and identifying paradata, that is, information relating to scholarly data creation, processing, and (re)use (summary in Figure 1). When shifting the focus from the information-on-information objects to the processes through which the objects are generated, investigated, and/or documented, the concept of paradata emerged as a helpful shortcut to approach a diverse assortment of approaches with a common denominator in their orientation toward process information.

Figure 1.

Categories of paradata identification and generation methods and paradata artifact types.

Our categorization of paradata artifact types and methods proposed for generating and identifying paradata has obvious limitations and should be considered only as a first explorative step toward a comprehensive taxonomy. Our approach to review a broad range of methods and approaches we deemed applicable for documenting data making and processing means that the selection is somewhat eclectic and consist of both data-specific and unrelated but potentially useful techniques if applied to data documentation. The list of methods is also obviously incomplete and as such, for example, the popularity of particular artifact types or approaches cannot be quantified on the basis of the present findings. The same applies to individual paradata artifacts. However, we argue that it is complete enough for the purpose of this study to identify categories rather than to produce a systematic classification of individual methods or artifacts. Another equally apparent limitation is that categorizing artifacts and methods with a specific but conceptually underdeveloped common denominator is not without problems. One particular difficulty of categorizing methods is that they serve different purposes, are spatiotemporally difficult to pin down, and that they generate multiple types of artifacts as outputs. We have focused on facets of the individual processes through which data are either generated and identified, rather than on the specifics of the product generated in the process. Also, this study has not paid specific attention to the primary purpose or context of the reviewed methods but rather on their potential tenability as approaches to increase understanding of data creation, processing, and (re)use.

A major complication in our exercise was the evident overlap between categories especially with methods. However, rather than a limitation, this is one of the findings. Methods do overlap both conceptually and in practice. For example, there are multiple examples of methods aiming to retrospectively and in situ document stepwise activities to generate descriptive rather than prospective workflows. For example, both Yan et al. (2020) and Deelman et al. (2018) refer to a workflow as a unit of observation rather than as a template. Many methods (e.g. Duranti, 2009; Migliorini et al., 2022) are also iterative and extend documentation of past processes and practices to inform future work. Through such overlaps and contrasts, the methods and approaches fold into a complex methodological assemblage that crosses the temporal boundaries between being exercised prospectively, in situ, and retrospectively.

The temporality of methods and when they are used affect what is described (prospective, in situ, retrospective) and what can be expected of the paradata, that is, does it entail stipulation of forthcoming practices, inscriptions and observations of on-going actions, or recalling of the past. Concerning temporality and agency, it is also interesting to observe that the category of prospective methods appeared to primarily consist of literally prescriptive rather than in a broader sense, ex ante approaches. Rather than envisage, they tended to stipulate, direct, create, and construct future practices. Anticipatory or speculative non-prospective documentation of data creation, processes and (re)use (cf. Huvila, 2023) seems rare although potentially fruitful to consider as a less oppressive and obliging approach to imagining future practices and processes. In more general terms following Mathieu (2023), it is also possible to sense how the different methods vary in how they relate to practices and processes. In situ methods often describe (conveying an account of practices or processes), but also subscribe (adher to them), proscribe (forbid access), circumscribe (restrict access), or ascribe (explain) them, whereas retrospective methods also transcribe or rearrange practices and processes.

While the temporal categories shed light on, the form of paradata fixes how data creation, processing, and (re)use is conceptually affixed as a particular kind of process or practice. A standard procedure is a fundamentally different type of endeavor than what is encased in a form of narrative or a momentary snapshot. Here, the categories appear to cut cross moments in time or goals of paradata making and rather be related to ontological understandings of what is paradata and its referents. The same observation pertains to the fact that the identified types of paradata artifacts generated or identified are seldom method-specific with some exceptions. Certain paradata artifacts are more typical to particular temporalities. Snapshots are produced in situ whereas narratives are generated either in situ or retrospectively but seldom prospectively. Diagrammatic representations are associated especially with prospective and retrospective methods but not entirely absent from notetaking in situ. Structured metadata can be produced at different temporal phases and distilled from observations and documentation produced using multiple methods.

The overlaps are apparent also when the present findings are compared to previous categorizations of paradata. The interview study of Börjesson et al. (2022a) identified four categories of paradata (i.e. types of data, not types of methods or artifacts) including scope (coverage of data), provenance (origins), methods (contexts and methods of data generation), and knowledge organization and representation of paradata (how data are structured, represented, and communicated). Similarly to how technically and temporally different methods can operate with similar types of paradata artifacts, diverse types of artifacts can contain, for example, scope, provenance, or methods information. The same applies to methods. Rather than operating with different types of paradata (as for Börjesson et al., 2022a) or paradata artifacts (this study), the identified methods differ in how they convey information on, for instance, scope, provenance, or knowledge organization and representation. The same applies to types of paradata artifacts in how a specific artifact can convey different types of information. Considering this, we argue that in order to understand paradata, methods and (paradata) artifacts, it is necessary to consider them all separately but in relation to each other.

A major contribution of this paper has been to advance the empirical understanding of the methodology for generating and identifying paradata. On a conceptual note, we found the concepts of boundary object and boundary work helpful in explaining the weak links between individual methods, paradata artifacts, moments in time and goals of making paradata, and in general, the methodological diversity and the apparently diverging meanings of specific methods for different groups of people. It is fair to assume that all the methods and the artifacts (data, product) they produce, make sense in the context of their origin as means to elucidate processes and practices. But like the product, that is, the data that are produced using the methods, can be interpreted differently by their different stakeholders, the methods themselves are understood and engaged in different ways, for example, temporally as prospective, contemporary and retrospective, or otherwise.

Conceptualized as boundary objects that function productively in multiple social worlds, even if understood somewhat differently (cf. Star and Griesemer, 1989), it is possible to understand the elasticity of the generated paradata and their capability to achieve different goals in different communities. Earlier studies have theorized instances of both in situ data documentation (e.g. Migliorini et al., 2022) and prospective artifacts, for example, data management plans (Kvale and Pharo, 2021), as boundary objects. However, the potential limitation of only focusing on (boundary) objects is that, while it helps to explain when paradata work, it is less useful for elucidating their differences and frequent disconnects. We suggest that at the same time as operating as boundary objects, through the methods used to generate them, the artifacts are engaged in boundary work of forming their distinct outsets, reifying, and stabilizing the processes they are used to prescribe or describe. A part of the artifacts linked to analyzed approaches appear to operate as boundary objects through their structural stability (structured metadata, diagrammatic representations, standard procedures) whereas others (narratives, snapshots) rely on their interpretative malleability. The simultaneous capacity of methods and data documentation to be delimiting and elastic can obviously be an advantage as it can contribute to their broader usability. The authoring of methods and the artifacts to boundary objects (Huvila, 2019) contributes to the latter whereas boundary work stabilizes them and creates a dialectic that reminds of routinization of practices and processes in how they simultaneously resist and enable malleability (Feldman et al., 2016). Depending on when paradata is generated, the stabilization can happen before, during or after a particular process is enacted leaving room for malleability for formulating the process before that particular point of time and interpretative adaptability to appropriate the generated paradata (boundary object) in use afterward. The dialectic of stability and pliability underlines the fragility of to what extent and how the documentation contributes to the transparency of data, and in the context of this study, data-related processes. At the same time, the interplay of constructing boundary objects and boundary work that takes place with and in relation to them can help to explain the (un)compatibilities and differences between approaches and outcomes.

When discussing paradata and methods as boundary objects and boundary work, we recognize also that the methods, and their aims and outcomes can be understood very differently depending on who it is that engages in the method and for what purpose the product is needed. We must ask whether it is the methods that generate artifacts including multiple temporal aspects that are the most meaningful and efficient. In this respect, it can be helpful to reflect on the boundary work that a particular method drives before engaging in using it to properly appreciate what the method does, and to consider to and for whom it serves a purpose. In spite of the risk of misunderstandings, the power of methodological flexibility of generating paradata is in that it permits flexibility of interpretation also for those who use their outcomes.

Conclusions

Our ability to assess the applicability of the methods we use is severely limited by the lack of proper documentation about the research design and the processes through which findings and research data have emerged. Therefore, this study has set out to further our understanding of potential approaches for identifying and capturing paradata, a concept that helped us to approach a diverse assortment of methods with a common denominator in their orientation toward process information.

We identified five major categories of paradata artifacts: (1) structured metadata, (2) narratives, (3) snapshots, (4) diagrammatic representations, and (5) standard procedures, and three overarching temporal categories of methods and approaches for generating and identifying paradata in the literature. The temporality of these categories is based on when the paradata generation or capturing takes place in relation to the activity it describes. First, prospective approaches stipulate, direct, create, and construct future practices. Second, in-situ approaches document and generate paradata on activities and practices at the moment when they take place, that is, they are ongoing work processes. Third, retrospective approaches involve procedures of analyzing past processes to either recreate or trace information, typically from secondary resources. Future research is needed to produce more nuanced knowledge of specific methods and their implications to generated paradata, to inquire more closely into the links between specific types of paradata, paradata artifacts and methods, and, for example, to the temporalities of paradata generation.

We posit that knowledge of methods and their outputs contributes to a better theoretical and empirical understanding of both, and consequently to the nuanced knowledge of data practices, their outcomes, and implications. In a very fundamental sense, it matters when in relation to its referents and in what form paradata is conveyed. On a practical note, a better understanding of artifacts that incorporate paradata and methods for generating and identifying them helps both researchers and, for example, data managers in selecting approaches and artifacts that are appropriate for the intended purposes of documenting data making and processing and identifying paradata, ultimately contributing to the (re)usability and intelligibility of datasets.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No 818210 as a part of the project CApturing Paradata for documenTing data creation and Use for the REsearch of the future (CAPTURE).

ORCID iDs

Amalia Juneström

Isto Huvila

Author biographies

Amalia Juneström holds a PhD in Information Science from Uppsala University. In her thesis, she explored journalists’ information practices in an evolving media landscape. Between 2022 and 2024, she worked for the CAPTURE research data project at Uppsala University. She is currently employed as a librarian at Umeå University Library.

Isto Huvila is a Professor in Information Studies at Uppsala University in Sweden. Huvila chaired the recently closed COST Action ARKWORK and directed the ERC funded research project CAPTURE. His primary areas of research include information and knowledge management, information work, knowledge organization, documentation and social and participatory information practices.

References

Acuña

Lacroix

Chomilier

(2012) Refurbishing legacy biological workflows SPROUTS case study. In: 2012 IEEE eighth world congress on services, pp.41–49.

Ahuja

Rosser

Grover

(2021) Is that a Duiker or Dik Dik next to the giraffe? Impacts of uncertainty on classification efficiency in citizen science. arXiv:2110.07750. arXiv. Available at: http://arxiv.org/abs/2110.07750 (accessed 7 March 2024).

Alamalhodaei

Alberda

Feigenbaum

(2020) Humanizing data through ‘data comics’ : An introduction to graphic medicine and graphic social science. In: Engebretsen

Kennedy

(eds) Data Visualization in Society. Amsterdam: Amsterdam University Press, pp.347–366.

Alexeeva

Perez-Palacin

Mirandola

(2016) Design decision documentation: A literature overview. In: Tekinerdogan

Zdun

Babar

(eds) Software Architecture. Lecture Notes in Computer Science. Cham: Springer International Publishing, pp.84–101.

Andresen

(2020) A discussion frame for explaining records that are based on algorithmic output. Records Management Journal 30(2): 129–141.

Antonijević

, et al. (2020) Digital workflow in the humanities and social sciences: A data ethnography. In: Crowder

Fortun

Besara

(eds) Anthropological Data in the Digital Age: New Possibilities - New Challenges. Cham: Springer International Publishing, pp.59–83.

Arshia

Rasekh

Moosavi

, et al. (2021) Traceability mining between unit test and source code based on textual analysis applied to software systems. Digital Scholarship in the Humanities 36(2): 268–285.

Banfi

(2022) Ellie’s journal: Para-narratives in the last of us part II. Game Studies 22(3).

Bartel

Garud

(2009) The role of narratives in sustaining organizational innovation. Organization Science 20(1): 107–117.

10.

Bates

(1996) The Getty end-user online searching project in the humanities: Report no. 6: Overview and conclusions. College & Research Libraries 57(6): 514–523.

11.

Beacham

(2011) Concerning the paradox of paradata. Or, “I don’t want realism; I want magic!” Virtual Archaeology Review 2(4): 49.

12.

Bentkowska-Kafel

Denard

Baker

(eds) (2012) Paradata and Transparency in Virtual Heritage. Farnham: Ashgate.

13.

Beretta

(2024) Semantic data for humanities and Social Sciences (SDHSS): An ecosystem of CIDOC CRM extensions for research data production and reuse. In: Riechert

Beyer

Blanke

Marx

(eds) Professorale Karrieremuster Reloaded. Entwicklung einer wissenschaftlichen Methode zur Forschung auf online verfügbaren und verteilten Forschungsdatenbanken der Universitätsgeschichte [Professorial Career Patterns Reloaded: Development of a Scientific Research Method for using Research Databases of University History Available and Searchable Online]. HTWK Leipzig/OA-HVerlag.

14.

Berggren Dell’Unto

Forte

, et al. (2015) Revisiting reflexive archaeology at çatalhöyük: integrating digital and 3D technologies at the trowel’s edge. Antiquity 89(344): 433–448.

15.

Berman

(2015) Repurposing Legacy Data. Amsterdam: Elsevier.

16.

Binder

Hinkel

Bots

PWG

, et al. (2013) Comparison of frameworks for analyzing social-ecological systems. Ecology and Society 18(4).

17.

Blair

(2021) Digital twins of the natural environment. Patterns 2(10): 1–3.

18.

Borglund

EAM

Öberg

(2018) Using scenario planning and personas as an aid to reducing uncertainty about future users. In: Gracy

(ed.) Emerging Trends in Archival Science. Lanham, MD: Rowman & Littlefield, pp.111–137.

19.

Börjesson

Huvila

Sköld

(2022a) Information needs on research data creation. Information research 27(Special Issue): isic2208.

20.

Börjesson

Sköld

Friberg

, et al. (2022b) Re-purposing excavation database content as paradata: An explorative analysis of paradata identification challenges and opportunities. KULA: Knowledge Creation, Dissemination, and Preservation Studies 6(3): 1–18.

21.

Börjesson

Sköld

Huvila

(2020) The politics of paradata in documentation standards and recommendations for digital archaeological visualisations. Digital Culture & Society 6(2): 191–220.

22.

Bowker

Star

(1999) Sorting Things Out: Classification and Its Consequences. Inside Technology. Cambridge, MA: MIT Press.

23.

Brill

(2000) Video-recording as part of the critical archaeological process. In: Hodder

(ed.) Towards Reflexive Method in Archaeology: The Example at Çatalhöyük. Cambridge: McDonald Institute for Archaeological Research, pp.229–234.

24.

Buchanan

(2016) A Provenance Research Study of Archaeological Curation. Austin, TX: University of Texas at Austin.

25.

Buchanan

(2023) Envisioning networked provenance data storytelling with American cuneiform collections. International Journal on Digital Libraries 24(3): 149–158.

26.

Callery

Dawson

Reilly

, et al. (2021) Temporal ripples in art/archaeology images. In: Dawson

Jones

Minkin

(eds) Diffracting Digital Images. London: Routledge, pp.97–119.

27.

Cameron

Franks

Hamidzadeh

(2023) Positioning paradata: A conceptual frame for AI processual documentation in archives and recordkeeping contexts. Journal on Computing and Cultural Heritage 16: 1–19.

28.

Canfield

Reveal

Heinrich

, et al. (2011) Field Notes on Science and Nature. Cambridge, MA: Harvard University Press.

29.

Carboni

Bruseker

Guillem

, et al. (2016) Data provenance in photogrammetry through documentation protocols. ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences III–5: 57–64.

30.

Champion

Rahaman

(2019) 3D digital heritage models as sustainable scholarly resources. Sustainability 11(8): 2425.

31.

Chao

Cragin

Palmer

(2015) Data practices and curation vocabulary (DPCVocab): An empirically derived framework of scientific data practices and curatorial processes. Journal of the Association for Information Science and Technology 66(3): 616–633.

32.

Chinosi

Trombetta

(2012) BPMN: An introduction to the standard. Computer Standards & Interfaces 34(1): 124–134.

33.

Choumert-Nkolo

Cust

Taylor

(2019) Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of Development Economics 23(2): 598–618.

34.

Cline

(2022) The archivist as translator: Representation and the language of context. American Archivist 85(1): 126–145.

35.

Couper

(2000) Usability evaluation of computer-assisted survey instruments. Social Science Computer Review 18(4): 384–396.

36.

Cox

Villamayor-Tomas

Epstein

, et al. (2016) Synthesizing theories of natural resource management and governance. Global Environmental Change 39: 45–56.

37.

Dahlström

Hansson

(2019) Documentary provenance and digitized collections: Concepts and problems. Proceedings from the Document Academy 6(1).

38.

Davet

Hamidzadeh

Franks

(2023) Archivist in the machine: Paradata for AI-based automation in the archives. Archival Science 23(2): 275–295.

39.

Dawson

Reilly

(2019) Messy assemblages, residuality and recursion within a phygital nexus. Epoiesen. DOI: 10.22215/epoiesen/2019.4.

40.

Deelman

Peterka

Altintas

, et al. (2018) The future of scientific workflows. International Journal of High Performance Computing Applications 32(1): 159–175.

41.

Dekker

(2003) Failure to adapt or adaptations that fail: Contrasting models on procedures and safety. Applied Ergonomics 34(3): 233–238.

42.

Delage

(2017) Once upon a time. . .the (hi)story of the concept of the chaîne opératoire in French prehistory. World Archaeology 49(2): 158–173.

43.

Dell’Era

Landoni

(2014) Living Lab: A methodology between user-centred design and participatory design: Living lab. Creativity and Innovation Management 23(2): 137–154.

44.

Dell’Unto

Landeschi

Apel

, et al. (2017) 4D recording at the trowel’s edge: Using three-dimensional simulation platforms to support field interpretation. Journal of Archaeological Science Reports 12: 632–645.

45.

Demetrescu

Fanini

Cocca

(2023) An online dissemination workflow for the scientific process in CH through semantic 3D: EMtools and EMviq open source tools. Heritage 6(2): 1264–1276.

46.

Denard

(2012) A new introduction to the London Charter. In: Anna

Denard

Baker

(eds) Paradata and Transparency in Virtual Heritage. Farnham: Ashgate Publishing, pp.57–71.

47.

Denard

(2014) Ecologies of research and performance: Preservation challenges in the London Charter. In: Delve

Anderson

(eds) Preserving Complex Digital Objects. London: Facet Publishing, pp.169–184.

48.

Derudas

(2021) Archaeological publication systems: Which route to take? A compass for addressing future development. In: The 26th international conference on 3D web technology, Pisa, Italy, pp.1–6. ACM. DOI: 10.1145/3485444.3487648

49.

Doerr

Ore

Stead

(2007) The CIDOC conceptual reference model - a new standard for knowledge sharing. In: 26th International conference on conceptual modeling (ER 2007), Auckland, New Zealand (eds Grundy

Hartmann

Laender

AHF

, et al.), Sydney, pp.51–56. Australian Computer Society.

50.

Doerr

Theodoridou

(2011) CRMdig: A generic digital provenance model for scientific observation. In: TAPP11: 3rd USENIX workshop on the Theory and Practice of Provenance.

51.

Donnelly

(2012) Data management plans and planning. In: Pryor

(ed.) Managing Research Data. Facet. London: Facet, pp. 83–104.

52.

Dorrell

(1994) Photography in Archaeology and Conservation. Cambridge: Cambridge University Press.

53.

Dourish

Gómez Cruz

(2018) Datafication and data fiction: Narrating data and narrating with data. Big Data & Society 5(2): 1–10.

54.

Duranti

(1998) Diplomatics : New Uses for an Old Science. Lanham, MD: Scarecrow Press.

55.

Duranti

(2009) From digital diplomatics to digital records forensics. Archivaria 68(Fall): 39–66.

56.

Dykes

(2019) Effective Data Storytelling: How to Drive Change With Data, Narrative, and Visuals. Hoboken, NJ: Wiley.

57.

Fabre

Azeroual

Bellot

, et al. (2022) GRAPHYP: A Scientific Knowledge Graph with Manifold Subnetworks of Communities. Detection of Scholarly Disputes in Adversarial Information Routes. arXiv:2205.01331 [cs]. Epub ahead of print May 2022.

58.

Fafalios

Marketakis

Axaridou

, et al. (2023) A workflow model for holistic data management and semantic interoperability in quantitative archival research. Digital Scholarship in the Humanities 38(3): 1049–1066.

59.

Faniel

Frank

Yakel

(2019) Context from the data reuser’s point of view. Journal of Documentation 75(6): 1274–1297.

60.

Feldman

Pentland

D’Adderio

, et al. (2016) Beyond routines as things: Introduction to the special issue on routine dynamics. Organization Science 27(3): 505–513.

61.

Foscarini

(2012) Diplomatics and genre theory as complementary approaches. Archival Science 12(4): 389–409.

62.

Fredrikzon

(2021) Kretslopp av data : Miljö, befolkning, förvaltning och den tidiga digitaliseringens kulturtekniker. Lund: Mediehistoriskt arkiv. [Cycles of data : Environment, Population, Administration, and the Cultural Techniques of Early Digitalization]. Lund: Mediehistoriskt arkiv

63.

Gant

Reilly

(2017) Different expressions of the same mode: A recent dialogue between archaeological and contemporary drawing practices. Journal of Visual Art Practice 17(1): 100–120.

64.

Gardin

J-C

(1999) Archéologie, formalisation et sciences sociales. [Archaeology, formalisation, and social sciences] Sociologie et sociétés 31(1): 119–127.

65.

Gehani

Ahmad

Irshad

, et al. (2021) Digging into big provenance (with SPADE). Communications of the ACM 64(12): 48–56.

66.

Geiger

Ribes

(2011) Trace ethnography: Following coordination through documentary practices. In: 2011 44th hawaii international conference on system sciences (HICSS), pp.1–10.

67.

Gergen

(2008) Narratives in action. In: Bamberg

(ed.) Narrative – State of the Art. Amsterdam: John Benjamins Publishing Company, pp.133–143.

68.

Gieryn

(1983) Boundary-work and the demarcation of science from Non-Science: Strains and interests in professional ideologies of scientists. American Sociological Review 48(6): 781–795.

69.

Goble

Cohen-Boulakia

Soiland-Reyes

, et al. (2020) FAIR computational workflows. Data Intelligence 2(1–2): 108–121.

70.

Gregory

Groth

Cousijn

, et al. (2019) Searching data: A review of observational data retrieval practices in selected disciplines. Journal of the Association for Information Science and Technology 70(5): 419–432.

71.

Gruca

Cámara-Leret

Macía

, et al. (2014) New categories for traditional medicine in the Economic Botany Data Collection Standard. Journal of Ethnopharmacology 155(2): 1388–1392.

72.

Hackos

(2016) International Standards for Information Development and content management - Uppsala University. IEEE Transactions on Professional Communication 59(1): 24–36.

73.

Hann

(2021) Modelling Kiesler’s endless theatre: Approaches to paradata for heritage visualization. Theatre & Performance Design 7(1–2): 96–115.

74.

Hodges

(2021) Forensically reconstructing biomedical maintenance labor: PDF metadata under the epistemic conditions of COVID-19. Journal of the Association for Information Science and Technology 72(11): 1400–1414.

75.

Holmberg

Hjørungdal

(2015) Archaeology and history as companion disciplines: Co-analysing Georg Sarauw’s work on the Mullerup excavation at the start of the 1900s. Lund Archaeological Review 21: 7–20.

76.

Holmes

(1990) Laboratory notebooks: Can the daily record illuminate the broader picture?

Proceedings of the American Philosophical Society

134(4): 349–366.

77.

Houf

(2020) Boundary work and boundary objects: Synthesizing two concepts for moments of controversy. Journal of Technical Writing and Communication 51: 293–312.

78.

Hrynick

Anderson

Moore

, et al. (2023) Embedding Librarians in archaeological field schools. Advances in Archaeological Practice 11(4): 434–441.

79.

Hsieh

C-Y

Chen

S-A

C-L

, et al. (2023) Tool documentation enables zero-shot tool-usage with large language models. arXiv:2308.00675. arXiv. Available at: http://arxiv.org/abs/2308.00675 (accessed 6 March 2024).

80.

Huggett

(2020) Capturing the silences in digital archaeological knowledge. Information 11(5): 278.

81.

Hughes

Rouncefield

Rodden

, et al. (1998) How to ‘represent’ the workers: Understand the work of representations. Available at: https://www.lancaster.ac.uk/fass/resources/sociology-online-papers/papers/tolmie-et-al-how-to-represent-the-workers.pdf

82.

Hughes

Constantopoulos

Dallas

(2015) Digital methods in the humanities. In: Schreibman

Siemens

Unsworth

(eds) A New Companion to Digital Humanities. Hoboken, NJ: A New Companion to Digital Humanities. Hoboken, NJ: John Wiley & Sons, Ltd, pp.150–170.

83.

Huurdeman

Piccoli

(2021) 3D reconstructions as research hubs: Geospatial interfaces for real-time data exploration of seventeenth-century Amsterdam domestic interiors. Open Archaeology 7(1): 314–336.

84.

Huvila

(2019) Authoring social reality with documents: From authorship of documents and documentary boundary objects to practical authorship. Journal of Documentation 75(1): 44–61.

85.

Huvila

(2021) Documenting archaeological work processes for enabling future reuse of data: The CAPTURE project. The European Archaeologist 69.

86.

Huvila

(2022) Improving the usefulness of research data with better paradata. Open Information Science 6(1): 28–48.

87.

Huvila

(2023) On infrastructural speculation. Current Swedish Archaeology 31: 39–42.

88.

Huvila

Andersson

Sköld

(2022) Citing methods literature: Citations to field manuals as paradata on archaeological fieldwork. Information Research 27(3).

89.

Huvila

Sinnamon

(2022) Sharing research design, methods and process information in and out of academia. Proceedings of the Association for Information Science and Technology 59(1): 132–144.

90.

Huvila

Sköld

(2023) A fieldwork manual as a regulatory device: Instructing, prescribing and describing documentation work. Journal of Information Science. Epub ahead of print 19 October 2023. DOI: 10.1177/01655515231203506

91.

Huvila

Sköld

Andersson

, et al. (2023) Knowing-in-practice, its traces and ingredients. In: Cozza

Gherardi

(eds) The Posthumanist Epistemology of Practice Theory: Re-imagining Method in Organization Studies and Beyond. Cham: Palgrave MacMillan.

92.

Huvila

Sköld

Börjesson

(2021) Documenting information making in archaeological field reports. Journal of Documentation 77(5): 1107–1127.

93.

Ince

Hoadley

Kirschner

(2022) A qualitative study of social sciences faculty research workflows. Journal of Documentation 78(6): 1321–1337.

94.

Jiang

Murphy

Vawdrey

, et al. (2014) Characterization of a handoff documentation tool through usage log data. In: AMIA . . . Annual Symposium proceedings 2014, pp.749–756. American Medical Informatics Association.

95.

Kalir

Garcia

(2021) Annotation. Cambridge, MA: The MIT Press.

96.

Khazraee

(2019) Assembling narratives: Tensions in collaborative construction of knowledge. Journal of the Association for Information Science and Technology 70(4): 325–337.

97.

Knaflic

(2020) Storytelling With Data: Let’s Practice! Hoboken, NJ: Wiley.

98.

Koesten

Kacprzak

Tennison

, et al. (2019) Collaborative practices with structured data: Do tools support what users need? In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp.1–14. New York, NY: ACM.

99.

Kunz

, et al. (2020) Paradata in survey research. In: Atkinson

Delamont

Cernet

(eds) Sage Research Methods: Mixed Methods. Thousand Oaks, CA: Sage Publications Ltd.

100.

Kvale

Pharo

(2021) Understanding the data management plan as a boundary object through a multi-stakeholder perspective. International Journal of Digital Curation 15(1): 16.

101.

Lake

(2012) Open archaeology. World Archaeology 44(4): 471–478.

102.

Lee

Ludäscher

Glavic

(2018) PUG: A framework and practical implementation for why & why-not provenance (extended version). IIT DB Group Technical Report IIT/CS-DB-2018-02, techreport. Chicago, IL: Illinois Institute of Technology.

103.

Leroi-Gourhan

(1964) Le geste et la parole - technique et langage [Gesture and speech: Technique and language]. Paris: Albin Michel.

104.

Leroi-Gourhan

(1964) Le Geste Et La Parole - Technique Et Langage. Paris: Albin Michel.

105.

Song

, et al. (2023) INSTANT: A runtime framework to orchestrate in-situ workflows. In: Cano

Dikaiakos

Papadopoulos

(eds) Euro-Par 2023: Parallel Processing. Lecture Notes in Computer Science. Cham: Springer Nature, pp.199–213.

106.

Light

Hyry

(2002) Colophons and annotations: New directions for the finding aid. American Archivist 65(2): 216–230.

107.

Llebot

Van Tuyl

(2019) Peer Review of research data submissions to ScholarsArchive@OSU: How can we improve the curation of research datasets to enhance reusability? Journal of eScience Librarianship 8(2): e1166.

108.

Locatelli

Simone

Ardesia

(2010) Collocated social practices surrounding photo usage in archaeology. In: Proceedings of COOP 2010 (eds Lewkowicz

Hassanaly

Wulf

, et al.), pp.163–182. London: Springer.

109.

Locatelli

Simone

Ardesia

(2011) Collocated social practices surrounding photo usage in archaeology. Computer Supported Cooperative Work (CSCW) 20(4): 305–340.

110.

Lo Turco

Calvano

Giovannini

(2019) Data modeling for museum collections. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W9: 433–440.

111.

Ludäscher

Altintas

Berkley

, et al. (2006) Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18(10): 1039–1065.

112.

Ludäscher

Missier

Ram

(2015) From data sharing to reproducible science via workflows and provenance. Epub ahead of print 2015.

113.

Malik

Nistor

Gehani

(2010) Tracking and sketching distributed data provenance. In: 2010 IEEE sixth international conference on e-Science, pp.190–197.

114.

(2022) Digital humanities as a cross-disciplinary battleground: An examination of inscriptions in journal publications. Journal of the Association for Information Science and Technology 73(2): 172–187.

115.

Maryl

Dallas

Edmond

, et al. (2020) A case study protocol for meta-research into digital practices in the humanities. Digital Humanities Quarterly 14(3): 477.

116.

Mathieu

(2023) Deconstructing the notion of algorithmic control over datapublics. In: Hartley

Sørensen

Mathieu

(eds) DataPublics: The Construction of Publics in Datafied Democracies. Bristol: Bristol University Press, pp.27–48.

117.

Mccarthy

Sebo

Wilkinson

, et al. (2020) Open workflows for polychromatic reconstruction of historical sculptural monuments in 3D. Journal on Computing and Cultural Heritage 13(3): 1–16.

118.

McDonald

Schmalz

Monheim

, et al. (2021) Describing, organizing, and maintaining video game development artifacts. Journal of the Association for Information Science and Technology 72(5): 540–553.

119.

McFadyen

Hicks

(eds)(2020) Archaeology and Photography. London: Routledge.

120.

Meloni

(2016) From boundary-work to boundary object: How biology left and re-entered the social sciences. The Sociological Review 64(1_suppl): 61–78.

121.

Methods and Data Comparability Board (2002) National Environment Methods Index (NEMI). Washington, DC: USGS. Available at: https://www.nemi.gov/about/

122.

Miceli

Yang

Alvarado Garcia

, et al. (2022) Documenting data production processes: A participatory approach for data work. In: Proceedings of the ACM on human-computer interaction. New York, NY: Association for Computing Machinery.

123.

Mickel

(2015) Reasons for redundancy in reflexivity: The role of diaries in archaeological epistemology. Journal of Field Archaeology 40(3): 300–309.

124.

Migliorini

Quintarelli

Belussi

(2022) Tracking data provenance of archaeological temporal information in presence of uncertainty. Journal on Computing and Cultural Heritage 15(2): 1–32.

125.

Miksa

Strodl

Rauber

(2014) Process management plans. International Journal of Digital Curation 9(1): 83–97.

126.

Missier

(2016) The lifecycle of provenance metadata and its associated challenges and opportunities BT - building trust in information. In: Lemieux

(ed.). Building trust in information. Cham: Springer International Publishing, pp.127–137.

127.

Mitman

Wilder

(2017) Documenting the World: Film, Photography, and the Scientific Record. Chicago, IL: University of Chicago Press.

128.

Morreale

(2022) History as antidote: The argument for documentation in digital history. History and Theory 61(4): 64–76.

129.

Mosconi

de Carvalho

AFP

Syed

, et al. (2023) Fostering research data management in collaborative research contexts: Lessons learnt from an ‘embedded’ evaluation of ‘data story’. Computer Supported Cooperative Work (CSCW) 32: 911–949.

130.

Mudge

(2012) Transparency for empirical data. In: Bentkowska-Kafel

Denard

Baker

(eds) Paradata and Transparency in Virtual Heritage. Farnham: Ashgate Publishing, pp.177–188.

131.

Nesvijevskaia

(2021) DATABOOK: A standardised framework for dynamic documentation of algorithm design during Data Science projects. IASSIST Quarterly 45(2).

132.

Niccolucci

(2012) Setting standards for 3D visualization of cultural heritage in Europe and beyond. In: Bentkowska-Kafel

Denard

Baker

(eds) Paradata and Transparency in Virtual Heritage. Farnham: Ashgate Publishing, pp.23–36.

133.

Niu

Hedstrom

(2008) Documentation evaluation model for social science data. Proceedings of the American Society for Information Science and Technology 45(1): 11–11.

134.

Nosek

Lakens

(2014) Registered reports: A method to increase the credibility of published results. Social Psychology 45(3): 137–141.

135.

Oberbichler

Boroş

Doucet

, et al. (2022) Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians. Journal of the Association for Information Science and Technology 73(2): 225–239.

136.

O’Connor

Goodwin

(2017) The secondary analysis of fieldnotes, marginalia and paradata from past studies of young people. In: Edwards

Goodwin

O’Connor

(eds) Working With Paradata, Marginalia and Fieldnotes. Cheltenham: Edward Elgar Publishing, pp.94–114.

137.

Palmer

Thomer

Baker

, et al. (2017) Site-based data curation based on hot spring geobiology. PLoS One 12(3): e0172090.

138.

Partelow

(2023) What is a framework? Understanding their purpose, value, development and use. Journal of Environmental Studies and Sciences 13(3): 510–519.

139.

Poirier

(2020) Ethnographies of datasets: Teaching critical data analysis through R notebooks. Journal of Interactive Technology and Pedagogy (18).

140.

Polančič

(2020) BPMN-L: A BPMN extension for modeling of process landscapes. Computers in Industry 121: 103276.

141.

Pomerantz

(2015) Metadata. Cambridge: MIT Press.

142.

Post

Chassanoff

(2021) Beyond the workflow: Archivists’ aspirations for digital curation practices. Archival science 21(4): 413–432.

143.

Powlesland

(2016) 3Di – enhancing the record, extending the returns, 3D imaging from free range photography and its application during excavation. In: The three dimensions of archaeology proceedings of the XVII UISPP world congress (1–7 september 2014, burgos, spain) volume 7, sessions a4b and A12 (eds Kamermans

de Neef

Piccoli

, et al.), pp.13–32. Oxford: Archaeopress.

144.

Rainey

Macfarlane

Puussaar

, et al. (2022) Exploring the role of paradata in digitally supported qualitative co-research. In: CHI Conference on Human Factors in Computing Systems, pp.1–16. ACM.

145.

Reilly

Callery

Dawson

, et al. (2021) Provenance illusions and elusive paradata: When archaeology and art/archaeological practice meets the phygital. Open Archaeology 7(1): 454–481.

146.

Richards

Tudhope

Vlachidis

(2015) Text mining in archaeology: Extracting information from archaeological reports. In: Barcelo

Bogdanovic

(eds) Mathematics and Archaeology. Boca Raton, FL: CRC Press, pp.228–238.

147.

Richards-Rissetto

Landau

(2019) Digitally-mediated practices of geospatial archaeological data: Transformation, integration, & Interpretation. Journal of Computer Applications in Archaeology 2(1): 120–135.

148.

Rodrigues

Teixeira Lopes

(2022) Describing data in image format: Proposal of a metadata model and controlled vocabularies. Journal of Library Metadata 0(0): 213–221.

149.

Rösch

(2021) From drawing into digital: On the transformation of knowledge production in postexcavation processing. Open Archaeology 7(1): 1506–1528.

150.

Ruijer

(2021) Designing and implementing data collaboratives: A governance perspective. Government Information Quarterly 38(4): 101612.

151.

Sandoval

(2021) In pursuit of a reflexive recording. An epistemic analysis of excavation diaries from the çatalhöyük research project. Norwegian Archaeological Review 53(2): 135–153.

152.

Sant

(ed.) (2017) Documenting Performance: The Context and Processes of Digital Curation and Archiving. London, New York: Bloomsbury Methuen Drama.

153.

Savai

Hasan

Kamano

, et al. (2022) Data cleaning process for mHealth log data to inform health worker performance. In: Mantas

Gallos

Zoulias

, et al. (eds) Advances in Informatics, Management and Technology in Healthcare. Amsterdam: IOS Press, pp.75–78.

154.

Scherr

DeSousa

Moore

, et al. (2021) App use and usability of a barcode-based digital platform to augment COVID-19 contact tracing: Postpilot survey and paradata analysis. JMIR Public Health and Surveillance 7(3): e25859.

155.

Schröder

Staehlke

Groth

, et al. (2022) Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation. Journal of Biomedical Semantics 13(1): 4.

156.

Schwandt

SILKE

(2022) Opening the Black Box of interpretation: Digital history practices as models of knowledge. History and Theory 61(4): 77–85.

157.

Seedat

Imrie

Schaar

MVD

(2024) Navigating data-centric artificial intelligence with DC-check: Advances, challenges, and opportunities. IEEE Transactions on Artificial Intelligence 5(6): 2589–2603.

158.

Shoilee

SBA

de Boer

van Ossenbruggen

(2023) Polyvocal knowledge modelling for ethnographic heritage object provenance. In: Acosta

Peroni

Vahdati

, et al. (eds) Knowledge Graphs: Semantics, Machine Learning, and Languages. Amsterdam: IOS Press, pp.127–143. Available at: https://ebooks.iospress.nl/doi/10.3233/SSW230010

159.

Sköld

Börjesson

Huvila

(2022) Interrogating paradata. In: Information reseach. Proceedings of the 11th international conference on conceptions of library and information science, Oslo Metropolitan University.

160.

Smith

(2021) Controlled vocabularies: Past, present and future of subject access. Cataloging & Classification Quarterly 59(2–3): 186–202.

161.

Smith

(2020) Emotional Heritage : Visitor Engagement at Museums and Heritage Sites. London: Routledge.

162.

Spedding

Tankard

(eds) (2021) Marginal Notes: Social Reading and the Literal Margins. Basingstoke: Palgrave Macmillan.

163.

Star

(1989) The structure of ill-structured solutions: Boundary objects and heterogeneous distributed problem solving. In: Gasser

Huhns

(eds) Distributed Artificial Intelligence. Volume II. Research Notes in Artificial Intelligence. London: Pitman, pp.37–54.

164.

Star

Griesemer

(1989) Institutional Ecology, translations and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science 19(3): 387–420.

165.

Stead

Doerr

(2015) CRMinf: The Argumentation Model - an Extension of CIDOC-CRM to Support Argumentation. Version 0.7. Purley: Paveprime.

166.

Suchman

(2007) Human-Machine Reconfigurations: Plans and Situated Actions, 2nd edn. Cambridge: Cambridge University Press.

167.

Thomer

Wickett

(2020) Relational data paradigms: What do we learn by taking the materiality of databases seriously? Big Data & Society 7(1): 205395172093483.

168.

Turner

(2012) Lies, damned lies and visualizations: Will metadata and paradata be a solution or a curse? In: Anna

Denard

Baker

(eds) Paradata and Transparency in Virtual Heritage. Farnham: Ashgate Publishing, pp.135–143.

169.

VandenBosch

Maull

Mayernik

(2023) Jupyter notebooks and institutional repositories: A landscape analysis of realities, opportunities and paths forward. The Code4Lib Journal (58).

170.

Van Mierlo

(2022) The scholarly edition as digital experience: Reading, editing, curating. Textual Cultures 15(1): 117–125.

171.

Vasudevan

Pfeffer

Davis

, et al. (2016) Improving data provenance reconstruction via a multi-level funneling approach. In: 2016 IEEE 12th international conference on e-Science (e-Science), pp.175–184.

172.

Vaz

Steinmacher

Marczak

(2019) An empirical study on task documentation in software crowdsourcing on TopCoder. In: 2019 ACM/IEEE 14th international conference on global software engineering (ICGSE), pp.48–57. Available at: https://ieeexplore.ieee.org/abstract/document/8807631 (accessed 6 March 2024).

173.

Videla

(2021) Meaning and context in computer programs: Sharing domain knowledge among programmers using the source code as the medium. ACM Queue 19(5): 60–68.

174.

Wofford

Boscoe

Borgman

, et al. (2020) Jupyter notebooks as discovery mechanisms for open science: Citation practices in the astronomy community. Computer Science and Engineering 22(1): 5–15.

175.

Yan

Huang

Lee

, et al. (2020) Cross-disciplinary data practices in earth system science: Aligning services with reuse and reproducibility priorities. Proceedings of the Association for Information Science and Technology 57(1): e218.

176.

Zabulis

Meghini

Dubois

, et al. (2021) Digitisation of traditional craft processes. Journal on Computing and Cultural Heritage 15(3): 1–24.

177.

Zabulis

Partarakis

Meghini

, et al. (2022) A representation protocol for traditional crafts. Heritage 5(2): 716–741.

178.

Zanini

(2012) Pubblicare uno scavo all’epoca di Youtube: comunicazione archeologica, narratività e video. [Publishing an excavation in the YouTube era: archaeological communication, narrativity and video] Archeologia e Calcolatori 23: 7–30.

179.

Zass

Johnston

Benkahla

, et al. (2023) Developing clinical phenotype data collection standards for research in Africa. Global Health Epidemiology and Genomics 2023: 1–9.