Abstract
Documents have been increasingly recognised as important objects of investigation in Science and Technology Studies (STS); however, so far, much less attention has been given to the study of documents produced in Digital Humanities. The author proposes therefore to use the method of the ‘STS of documents’ and analyse Feasibility documents that aim to assess technical and design requirements based on research questions and to organise a project workflow. Drawing on the ethnography of King’s Digital Lab, the article investigates Feasibility documents produced by the lab within the Agile-based Software Development Lifecycle framework. The article aims to show that Feasibility documents (1) inform ethnographic work about lab workflow and management and in doing so, are able to capture the interconnectedness of work layers and practices; (2) enable an empirical analysis of digital research projects and the process of translation from research questions, to methods, to technical solutions; (3) are
Keywords
Introduction
One of the most interesting changes in the Digital Humanities, a young field that applies digital tools and methods to the humanities, in the past few years is the emergence of new voices of research software engineers (RSEs), research software analysts and data scientists, who have introduced different epistemological and methodological approaches. Their experimental collaborations with humanists have led to taking on more risky and large-scale scholarly projects as exemplified by ‘Shakespeare in the Royal Collection’ conducted by King’s College London, Birkbeck, the University of London, the King’s Digital Lab, the Royal Collection Trust and CogApp; and ‘Living with Machines’ by the British Library, the Alan Turing Institute and other academic partners. Cross-institutional projects involve a large team of people who are responsible for particular tasks and their delivery at the right time and within budget. The outcomes of digital projects extend beyond a single output to produce a set of technologically and methodologically advanced artefacts. For example, in the case of ‘Shakespeare in the Royal Collection’, the team has produced a publicly accessible database of all the Shakespeare-related holdings, 3D visualisations of key spaces at Windsor Castle where Shakespeare’s plays were performed, and an exhibition and timeline presenting the history of Shakespeare in the Royal Collection (SHARC, 2021).
Digital historical and cultural artefacts are no longer only objects of humanities research analysis and interpretations, but they have become moving objects between different actors – universities, cultural institutions, creative industries, funding bodies – who represent different priorities, policies and opinions. Digital artefacts are therefore the objects of negotiations about their copyright clearance, task responsibilities, technical functionality, long-term sustainability and time delivery. Humanists are not accustomed to seeing scholarly research projects in terms of multi-layered project management and risk assessment. However, large-scale digital projects require a new way of approaching and delivering them.
Discussions about the roles of RSEs and data scientists have become more visible in the UK-based Digital Humanities (DH) community since 2019 when the Society of Research Software Engineering was founded. In 2020 and 2021, three significant reports were released, explicitly articulating the challenges and need for the future development of digital research:
The first aspect highlighted by the reports is to strengthen collaborations between humanists, RSEs and data scientists as a prerequisite for creating meaningful, critically driven and risk-assessed digital research (Peirson et al., 2016). However, to improve these partnerships, there is a need for common terminology and methodological understanding so that ‘methodological decisions can be fully explained, motivated and connected with the original research questions and interpretation of the research results’ (McGillivray et al., 2020: 15). The main challenge of collaborative projects results from the fact that they involve task divisions, skills and expertise and those are often used as a reference point to differentiate between intellectual contributions and technical work.
The study of work practices is therefore crucial for a better understanding of the role of technical contributors whose work is often neglected, as Steven Shapin states in the widely known article
The second aspect addressed by the UK reports mentioned above is the sustainability issue that is often associated with maintaining infrastructure and systems, but which is rarely connected with research management and workflow. An explicitly articulated workflow is, however, an assurance of the practical implementation of digital research projects. As Bergel et al. (2020: 17) argue, ‘The sustainability of DH is directly linked to the impact of the activities within which digital resources and methods are involved. Within those activities, DH is variably recognised’. Therefore, there is a need for more awareness about the importance of project management practices that along with feasibility and risk assessment can ensure the delivery of critically robust and sustainable artefacts.
To this end, for instance, the work of King’s Digital Lab in engaging RSE practices adapts the software industry’s approaches of Dynamic Systems Development Method (DSDM), an agile project delivery framework and the Software Development Lifecycle (SDLC) to design and produce high-quality products. This is a new perspective, and is not without its critics, as will be discussed later in this article. It also has implications for how DH may be considered within the commercial world. It is enlightening to consider that digital work does not only produce digital artefacts (e.g., software, tools, digital collections, databases) but it also ‘delivers change’ (Smithies, 2011): cultural, methodological and technical.
DH artefacts – digital collections, archives, 3D models, mapping and visualisations – are co-constructed by human, technical systems, operational methods and institutional bodies and constitute new forms of critical expression. They require us to understand and reveal the ways that humanists combine critical approaches in making digital objects, how they translate scholarly questions into technical solutions and how they collaborate with RSEs. This, in turn, demands empirical studies for the social unfolding of Digital Humanities practices.
The call for ethnographic and critical work within the Digital Humanities by the Software Sustainability Institute (Bergel et al., 2020: 7) is not the first attempt to draw attention to engagement in ethnographic research within DH. The calls for social studies by Christine L. Borgman (2009) and Alan Liu (2013) constituted an explicit articulation of the need to investigate how knowledge is produced by human beings in relation to, for example, instruments, technologies and institutions. Liu proposed the study of Digital Humanities centres as laboratories, similar to the ethnography of scientific laboratories developed in science and technology studies (Hess, 2001). A Digital Humanities lab is a site where humanities thinking is integrated with computational, engineering and organisational practices (Malazita et al., 2020; Pawlicka-Deger, 2020; Smithies and Ciula, 2020). It is therefore a technologically and socially dense organisation where an observer can uncover the practices and patterns behind task-oriented projects and team interactions and is a place of culture informed by local settings and policy. This article aims, therefore, to contribute to the development of ethnographic research in DH by providing a new methodological approach that is the analysis of documents produced in DH work. So far, the ethnographic studies in DH have been conducted based on interviews and observations (Antonijević, 2015). Documents, as the article shows, can in turn provide new insight into social and management practices that cannot be captured by observations. This approach, however, requires ethnographers to gain access to labs documents. This is not a straightforward process but an effort that demands receiving an ethical clearance and above all, building trust between ethnographers and lab members.
These discussions are a starting point for my reflections about the notions of critical technical practice (Agre, 1997), critical computational thinking (Berry and Fagerjord, 2017) and critical production (Smithies, 2017) in the context of Digital Humanities work. The goal of this article is to provide an analytical approach to investigate how humanities inquiries are translated into methodological and technical practices, and how technological and operational methods contribute to humanities knowledge production. The analysis is based on the ethnography of King’s Digital Lab (KDL), a lab comprising RSEs who work on technical research solutions for digital research in the humanities and social sciences. Although the case study represents a lab located in the United Kingdom, the methodological and theoretical analysis can contribute to the development of DH research conducted in other research environments.
KDL’s project management is based on an Agile approach to delivering solid and sustainable digital products, seen as ‘moving targets’ resulting from the coordination of ‘many moving parts, differing opinions and priorities’ (KDL, 2020a). Part of KDL’s Software Development Lifecycle is to prepare ‘Feasibility’ documents, which inform the lab’s decision to progress with a project proposal. Feasibility documents aim to assess technical and design requirements based on research questions and to organise a project workflow. They provide an interesting insight into the components, layers and management of digital products. The prevalent perception of the technical work of RSEs within DH as a ‘mechanistic and uncritical procedural activity’ (Ciula et al., 2020) reflects long-standing social and cultural attitudes that demonstrate a lack of knowledge of technical work. This essay will challenge this perspective and present interrelated and multi-layered actions of commitments, requirements and expectations as recorded in the lab Feasibility documents.
Documents are artefacts that include references to the social processes through which they were produced and reproduced (Shankar et al., 2017: 59). They have been increasingly recognised as important objects of investigation in science and technology studies (STS), as Kalpana Shankar et al. states: ‘Documents are often, maybe always, at the centre of efforts to achieve coordination and control, so greater attention to the complexity of their role should result in better accounts of how this is accomplished’ (2017: 70). I propose therefore to use the method of the ‘STS of documents’ (Shankar et al., 2017) and analyse Feasibility documents produced by KDL in a manner similar to STS-based studies of scientific labs’ protocols (Crabu, 2014) and kits (Neresini and Viteritti, 2014). I suggest Feasibility documents (1) inform ethnographic work about lab workflow and management and in doing so, are able to capture the interconnectedness of work layers and practices; (2) enable an empirical analysis of digital research projects and the process of translation from research questions, to methods, to technical solutions and (3) are
An Agile approach in a research software engineering-based digital humanities lab
King’s Digital Lab (KDL) was founded in 2015 to provide software development and infrastructure to departments in the faculties of Arts and Humanities, and Social Science and Public Policy at King’s College London. At the time of writing, it has 13 permanent full-time staff: research software analysts, research software engineers, a research software systems manager, UX/UI designers, project managers and a director. It is a unique lab entirely focused on research software engineering (RSE) to ‘create digital tools to explore academic research in new ways’. KDL has been evolving at the intersections of two communities: the Society of Research Software Engineering and Digital Humanities. These two groups seem to be divergent but are connected by the digital dimension of their work.
The perception of software engineering in DH is that it provides technical support and infrastructure for sharing resources in a similar manner to institutional IT services. RSE is, however, more than a service – it is a research service. While it is concerned with the ‘production, use, and maintenance of large, complex software systems’ (Gold, 2009), it does so by addressing research questions. The manufacture of research-enabling software systems has become part of the research workflow where software and tools are tailored specifically to scholarly questions, and incorporated into the lifecycle of a research project, from conceptualisation, to deployment, to preservation.
RSE has been changing the perception of technical work as purely mechanical labour devoid of intellectual elements. It is an engineering activity led by critical, methodological and epistemological concerns. KDL is devoted to technical development but above all, it aims to ‘explore the epistemic and methodological implications of digital humanities development’ (Smithies et al., 2019). Software is seen as an ‘epistemology engine’ (Smithies, 2017: 154) that shapes the way humanists gather, produce and transfer knowledge. Research software engineers draw attention to the process of production – of methods, software and infrastructure – showing how the manufacture of these layers underpinning scholarly research can affect the way scholars conceptualise and conduct that research. As Gold says, ‘Similar approaches have been used to great effect in scientific discovery where the process of reaching a result can be as important as the result itself’ (2009).
KDL is committed to the Software Intensive Humanities where ‘research proceeds through the development and use of tools designed to enhance (or in some cases make possible) certain kinds of research’ (Smithies, 2017: 154). Over the last 6 years (2016–21), the lab has delivered and inherited from the Department of Digital Humanities more than 150 projects involving more than 200 partners from higher education, cultural institutions and creative industries. The digital works are greatly diverse and range from digital archives and digital scholarly editions to data visualisations and analysis. Each work is built upon close collaboration with partners to ensure the delivery of high-quality digital outputs at the right time within budget. While the mixture of academic and business approaches and its implementation to the humanities domain is innovative and risky (e.g., the risk of adopting business perspectives on management, quality and success of products), the principles of collaboration and ‘being more than a digital service’ have a long tradition in Digital Humanities at King’s College London.
Harold Short in one of his interviews reflected on the development of the Centre for Computing in the Humanities (later the Department of Digital Humanities and closely affiliated to KDL) that was rapidly evolving at King’s College London in the early 2000s. The new centre began offering undergraduate programmes, launched a master’s programme, and coordinated large numbers of research projects. The crucial thing from the very outset is that we were not a service, we didn’t have a service role in our projects, we only took on projects if it was as equal partners … someone wanted a database built, we’ll say ‘well, the Computing Services Department can do that for you’. If they want to talk to us about how to address research questions, then that’s where we started. (Short et al., 2012)
It is fascinating to read about the development of Digital Humanities by their pioneers who built the foundation for such a new field, and who have been tasked with articulating the epistemology of building that is knowing and thinking through making (Bradley, 2019; Nyhan and Flinn, 2016; Ramsay and Rockwell, 2012; Thaller, 2017).
KDL was therefore built upon a long tradition of King’s Digital Humanities and has evolved from the Centre for Computing and the Humanities (1991), the Centre for eResearch in the Humanities (2008) and the Department of Digital Humanities (2011–). In 2011, at the moment of institutional transformation, the Department explicitly articulated its goal for future years to ‘more clearly identify it[self] as an academic department, rather than a service centre’ as well as to undertake the issue of digital preservation and sustainability that had not yet been resolved (Maron, 2011). One of the main goals of KDL at its first year of activity was to archive around 100 legacy Digital Humanities projects, delivered over a twenty-year period. This immense task turned out to be a challenge on many fronts: orphaned projects were left without funding for maintenance, among other things, and the obsolete PHP-based frameworks posed an infrastructural security risk (Smithies et al., 2019). To ensure a sustainable and scalable approach to digital research projects, KDL has adapted the Software Development Lifecycle (SDLC) which is built upon an Agile method.
SDLC refers to a methodology and production process with clearly defined tasks and requirements for delivering high-quality software (KDL, 2021a). This approach is widely used in software engineering where an artefact is built by an interconnected team responsible for providing functional and sustainable products within budget and time constraints. Figure 1 presents the SDLC adapted by KDL and which consists of nine stages: 1) initial contact (partners contact the lab with their project ideas); 2) internal assessment (KDL review the project ideas at the weekly lab Project Pipeline meeting and produce a ‘Terms of Reference’ document); 3) requirements assessment (the lab analyses the feasibility, risks and impact of the project, considers technical solutions, and designs development tailored to the project questions; prepares a Feasibility document; and makes the final go/no-go decision); 4) funding applications (the project proposal is submitted to grant funders); 5) kick off (if the project is funded, the lab organises a meeting with partners to discuss the work process); 6) evolutionary development (the process of developing artefacts based on timebox planning, communication and close collaboration with partners); 7) deployment (each increment is reviewed with the partners after which they are officially deployed); 8) release (the project goes live and partners can discuss with the lab how to make the project sustainable beyond the project lifespan) and 9) post-project (the lab assesses if the requirements and expectations have been met and assess the maintenance options of the project) (KDL, 2021b). King’s Digital Lab Software Development Lifecycle. Source: KDL, 2021b.
The KDL workflow is built upon a business-driven framework called the Agile approach. It is a method driven by the principles of iterative development (where the software production is broken down into smaller tasks tested and delivered incrementally), flexibility (a quick response and adaptation to changing conditions) and delivering a high-quality product on time (Agile Business Consortium, 2017). This approach contrasts with traditional project management, wherein the features of the project are fixed, meaning that all of them are equally important. Therefore, in order to deliver them, we can manipulate both the time frame and the cost to keep the project on track. Since all features are required and time and cost are variables, there is a risk that the project will be managed poorly, which leads to a lowering of the quality of the artefact. In the Agile approach, however, time, cost and quality are fixed, which means that we can negotiate features to deliver a high-quality product within a given project time frame. To determine which features are essential and which can be left out, the lab team uses the MoSCoW prioritisation technique to specify ‘must’, ‘should’, ‘could’ and ‘won’t have’ requirements. This technique will be explained further in the next section.
The Agile-based SDLC process is focused on delivering the software in the form of incremental pieces, which are specified in advance but can change in an ‘agile’ way during the process of design and development. While it is a useful approach that helps the team to have control over the production, it is necessary to be conscious that behind this rigid business structure, there are human aspects. Ultimately, the lab deals with individuals who have their own research ideas, expectations and aspirations. The lab is aware that they work
The SDLC can help the lab to keep a project on track when unpredictable human aspects disturb the work process. This business model is just a framework within which the lab seeks to tame the complexity of digital production. The flexibility of the Agile method is therefore useful to find the balance between expectations and requirements, innovation and sustainability and experimentation and functionality. The resistance to structured work processes such as the SDLC model that is sometimes experienced stems from the assumption that there is no place for creativity and individualism. However, at the Feasibility document preparation stage, the human elements of the project can come to the surface alongside technical aspects and institutional priorities.
Feasibility documents
Documents are rich empirical materials for ethnographic scholars as they document social practices (
Empirical materials reveal themselves as powerful objects that can illuminate organisational processes, social practices and a lab’s culture and aims in structuring and controlling the relationships between the actors involved in the production of artefacts. Tracking the term ‘responsibility’ in documents, for example, is an interesting way to identify the points of tension and uncertainty in the work process. As Lisa Garforth notes, ‘Seeing is dynamic and practiced in the social science field, and as such, its negotiation and management has consequences for our understanding of the ethnographic gaze and what is at stake in observational methods’ (2012: 266). The same practice can be applied to studying documents which record the processes the authors wanted to register. It is therefore important to be critical about what is
Feasibility documents constitute a part of the SDLC model and aim to ‘gather information to help decide if the project should be stopped, if unlikely to be viable’ (KDL’s Feasibility documents and project templates are openly available at GitHub (KDL, 2020b)). After the initial contact with partners and several meetings, the lab team prepares the Feasibility document to assess the project for the lab. This involves activities such as the elicitation of requirements, reviewing partners’ data samples and datasets, sketching technical solutions and checking whether the budget is sufficient for such work.
The aim of using a Feasibility document is to plan the work process by breaking down large segments of activities into items of high-level requirements following the MoSCoW technique. The document is also intended to assess risk and impacts and by doing so, it seeks to provide justification to the university for taking on the project. The document is prepared by the lead research software analyst in a collaboration with the research software engineers and designers who together outline possible technical solutions, taking into consideration a range of potential risks, from data set quality and accessibility, to methodological uncertainty, to technical security. The Feasibility documents are important artefacts that aim to be both fixed and flexible at the same time. They constitute a rich resource for the ethnographer who can follow which parts are under control and which are open to negotiation.
I propose to conduct a ‘feasibility analysis’ that can reveal the project management and development stages: 1) the analytical process (the translation of research questions into technical solutions); 2) the production process (the move from technical and design practices to research answers) and 3) the infrastructure and management process (project workflow and sustainability solutions). The analysis will also unpack the complex, iterative processes – along with the factors behind them – of moving from research questions to software analysis and design methods, to technical requirements. The Feasibility document represents the accumulation of expectations that are structured into the logical process of the translation of inquiries into material
Based on the analysis of 40 Feasibility documents produced by KDL, I first sought to understand how they inform the lab management work and how they contribute to structuring the digital research process. Thinking about the work structure, it is helpful to illustrate it by means of a diagram, in order to show the multi-layered structure of digital research production where all elements are interconnected. Figure 2 represents the feasibility model and the lab workflow. The lab work consists of the following processes and layers. Feasibility model: Structuring digital research process. Source: Author.
First is the ‘Analytical process’ which is the intellectual inquiry, meaning the research ideas submitted by partners (e.g., humanists, social scientists, archivists) to KDL and a methodological layer that aims to assess by the lab members which digital methods and software would be suitable for addressing the scholarly questions. This step is led by a research software analyst in close contact with partners and the rest of the lab team (developers and designers) who can help to ensure that the technical and methodological solutions are feasible and functional.
Next is the ‘Production process’ which entails the technical development and delivering increments to partners. This layer corresponds to the so-called ‘black box’ of production, meaning that partners receive the finished pieces of work without observing how the production was carried out. Having said that, the Feasibility documents provide transparent information about the development process; therefore, partners can gain an understanding of how the work will be conducted. The ‘degree’ of black box access depends, however, on partners’ technical knowledge and engagement. For instance, the requirement: ‘Minimal specification VM for Django set-up for data entry’ might be understood by some partners and unclear for others.
The ‘Maintaining process’ takes us to an infrastructure layer coordinated by the software system managers. This substrate of digital production reminds us how digital work is very much material – storage, backup solutions and random-access memory – and how humanists put less emphasis on the underlying processes of their work which is crucial for its accessibility, safety and sustainability.
The last, ‘Monitoring process’ constitutes a backbone management layer of the lab run by the project and lab managers. This is a cornerstone of the lab work responsible for planning, executing and controlling projects.
The diagram of the feasibility model (Figure 2) consists of the main part called the ‘Research case’ that is variable due to projects. Therefore, while the projects are treated individually, the lab organisation and Agile-based workflow around the ‘Research case’ stays relatively fixed and stable. This confirms how the SDLC approach aims to provide a strong foundation for the entire process.
The research cases, with their methodological, technical and infrastructural requirements, constitute solid empirical materials which enable ethnographers to follow the process of translation, from research objectives, to methods, to technical solutions. In Figure 3, I present an example of the KDL digital project, ‘Radical Translations: The Transfer of Revolutionary Culture between Britain, France and Italy (1789–1815)’ led by the King’s Department of French and the Department of English and Comparative Literature, which seeks to study translation as a political intervention in the period following the French Revolution (RT, 2021). The KDL contribution has been to produce a set of research/discovery visualisations, such as a timeline that would allow the research team to better understand the documents, the meaning of ‘being radical’, and its connection to translation strategies. The project requirements were broken down into eight high-level requirements devoted to separate but interconnected objectives, methods and technical solutions. Feasibility model. Research case: ‘Radical Translations: The Transfer of Revolutionary Culture between Britain, France and Italy (1789–1815)’. Source: Author. ‘Research Case’ section includes content extracted from the KDL feasibility document about the Radical Translations project.
Figure 3 can be read from top to bottom to see how the methods and technical development are tailored to each objective. For instance, one research question is to study transnational radical ideas and translations during the French period and their impact on cultural exchange. To do so, the team has chosen the prosopographical and bibliographical data model with associated data structure and data storage solution to collect information about translators and translated texts. This is a ‘must’ task (called M1: a ‘must’ requirement) as it aims to provide a core database for the research upon which the following tasks are built. The resource has been created using Django, an open-source web publishing framework that has been widely adopted by the KDL team. The document justifies the selection of this solution as a ‘stable, powerful, and scalable’ web platform that gives control over the interface and web copy.
Another task (called C7: a ‘could’ requirement) is devoted to representing the movement of people (translators and other activists) and for this objective, a visualisation method (e.g., maps) has been selected as a suitable technique. This requirement has been assigned as a ‘could’ task that strongly depends on M1, meaning that C7 is only feasible if biographical events and associated locations are collected and structured at the M1 assignment stage. For visualisations, the team chooses LeafletJS, which is a widely used open-source JavaScript library used for web mapping applications.
Going through the Feasibility documents, ethnographers can gain deep insight into the lab organisation and workflow processes. However, what is even more interesting is the analysis of their content to understand not only
Feasibility documents as critical structuring objects
In the section that follows, I intend to conceptualise Feasibility documents as Feasibility documents: The action and decision model. Source: Author.
These three components are interconnected and determined by many aspects represented in the form of the grey triangle in the diagram. The decision of whether to proceed with the project is a complex process whereby the lab team considers overlapping or conflicting relationships: the university management and infrastructure, lab management, strategy, time, research questions, capacity, budget, software stack, impact and risk. By analysing all those aspects filtered also through the culture of the lab, the team comes to the final go/no-go decision represented by a carefully assessed list of technical solutions, the division of responsibility, feasibility conditions and restrictions.
If we look again at Figure 3, we can explore how high-level humanities research questions (‘Intellectual inquiry’ section) are translated into technical solutions (‘Methodology’ and ‘Technical development’ layers) and what infrastructure is required to complete and maintain the digital products (‘Maintaining’ part). The Feasibility document, however, also provides insight into a decision-making process. The partners contacted the lab team to discuss digital solutions that would allow them to study transnational radical ideas and translations during the French period and to better understand what it means to be radical and how translation strategies are essential to this. As the next step, a research software analyst has explored methodologies to tackle the research questions and, after consultation with the sub-team assigned to the project idea, proposed the data modelling and associated data storage approach as well as user interface functionalities to identify recurring radical concepts and ideas. While assessing the possible technical solutions, the lab team has investigated preliminary data collected by the partners and decided whether they are sufficient to produce a representative data model. A research software analyst noted that the data have still been underdeveloped and identified them as one of the risks that needs to be addressed in the form of improving the structure of data collection. The lab team has proposed, therefore, to work on building a solid data structure that would be a prerequisite for creating discovery tools and conducting the research analysis. They have further identified high-level technical requirements and decided which one is ‘must’, ‘could’ or ‘won’t have this time’. These decisions are significant since they have further implications on research directions as well as costs. The lab team provides high-level information about what costs are needed to deliver the technical solution. As the research team has a specific amount of money, limited by grant funding, then it becomes clear what it can be done within the budget. This is a part of the negotiation and refinement of the research questions as the partners can propose which requirements should be prioritised. Having said that, the lab team decides the extent of shuffling the requirements as they might be mutually dependent, i.e., meaning that one task is only feasible if another one is completed.
Time is also another crucial factor in planning digital production. The lab team allocates time for infrastructure setup, accessibility testing and change freeze, and this entails that less time can be devoted to the design and development as well as other research explorations. This is clearly communicated to the partners and further; the teams together decide how to reconcile the time the research team gets to complete the project and the time proposed by the lab. Another factor taken into consideration by the lab team is impact. As the document specifies, ‘The resource been envisaged constitutes for now the only project idea KDL is involved with in collaboration with the department of Comparative Literature at KCL and could be an interesting challenge to extend current conceptual models and related tools that aim at intersecting historical space and time’. The research partnership turned out to be one of the key elements in making a decision on whether to proceed with the project.
This multistage and multi-layered decision-making process shows how the lab treats digital production and research software engineering seriously, as James Smithies stated in presenting KDL software stack: ‘This level of organisation helps us manage technology, but also promotes critical awareness: we don’t only use technology, we choose technology’ (2016). Digital research projects are more than employing tools and they involve, above all, a responsibility towards cultural heritage, artefacts, history and society. This is why the lab stresses the ‘design first philosophy’ that resonates well with Agre’s sensibility of critical technical practice: ‘According to Agre, design in a critical vein should itself be a form of inquiry into the systematic failures, limitations, and confusions that arise in the design of digital systems’ (Vertesi et al., 2017: 177). Reflective design and critical production approaches are visible in the lab work at the Feasibility document preparation stage where many voices, inquiries, and expectations are accumulated. I will next consider more closely the two previously mentioned relationships, beginning with the connections between data, methods and technical practices.
Data, methods and technical practices
The lab deals with many research cases, such as creating a digital edition of seventeenth-century manuscripts, developing a web application to conduct a demographic analysis of datasets, and producing a set of visualisations to track the circulation of individuals across spaces. How these cases are approached is through a routine and structured process whereby the team applies computational thinking; that is, ‘the ability to reflect routinely on where the computer may be applied to humanities work in order to automate a process or a collection of processes’ (Berry and Fagerjord, 2017: 85). This is followed by breaking up the production process into discrete elements and analysing their feasibility and limitations. The first layer of a relationship considered by the team becomes paramount: it is the connection between data, methods and technical practices. I intend to show that the decisions here are driven by the following three factors:
The first step in undertaking a digital project is the procedure of decomposition; that is, ‘transforming a large field of interest into a definite question and a procedure to answer it’ (Berry and Fagerjord, 2017: 88). This is the moment of selecting appropriate research methods that from a humanities perspective is an intuitive and implicit task, but which is not always straightforward to break down into a list of actions. However, for the RSE team,
A research software analyst (RSA) firstly assesses data access to identify what data are already available; for example, ‘Two manuscripts are deposited at the British Library’; ‘The Archives of the French Ministry of Foreign Affairs hold 120,000 digitised microfilm images of the records from 215 consulates around the world’; and ‘The RAF [Royal Air Force] cards contain the richest information, whilst the Naval cards give very limited information and are far less amenable to automated data extraction’. 1 Data locations, their formats and structures and the degree of their accessibility determine how long it would take to process them, which can affect the timeline of project creation and what costs would be involved to conduct copyright clearance. These are important questions that if not resolved at this stage can affect time and budget which are critical and fixed entities in the Agile model of project management.
The RSA also looks at the quality of the data to assess their suitability for research methods (‘Any visualisation that would track the circulation/mobility of individuals would need to take into consideration the fuzziness of geographies and inconsistency of the data’), whether they are ready for processing (‘That’s what I originally thought they wanted to do, but their data collection is still
As well as this, the RSA considers the data format to ensure that different datasets can be easily integrated: ‘Note that as these microhistories of modern mobility would require gathering of sources also from other archives, hence the data model would need to be integrated with references to external sources’. By reviewing datasets, the lab chooses a case study approach to work first with a small number of data to check their reliability and functionality: ‘This sample amounts to 10,034 records which could provide an interesting case study to experiment with digital methods and in particular map-based and other kinds of data visualisations’.
Data samples prepared in the correct structure and standard format allow for their scalability that comprises the next project phase where a large number of datasets can be studied, based on the tested models and methods. This approach shows how datasets are critical not only for the project’s feasibility but also for establishing a good relationship between the team and partners. It can be exhausting when the lab team struggles to understand or access the datasets. We can say that the less is known about the data, the riskier the project and commitment to it is.
The next RSA tasks are to assess the technical requirements in close collaboration with the developers and designers. The requirements involve a range of different actions, from building a content management system for the research team to edit data information, to designing browsing functionality. At this stage, the humanities approach can be enhanced by the engineering perspective which enables an estimation of the technical possibilities and risks in delivering a digital artefact to address the scholarly questions being asked. As one comment in the document reads, ‘Conversion from RDBMS to Neo4J may be fairly trivial, though web presentation could be challenging and will present a design problem of enabling intuitive search interface for non-technical users’.
Along with the technical requirements, the RSA decides which software and tools will be used for particular tasks by taking into consideration their epistemological affordances – the way that technology influences the understanding of the final product and the epistemic process (Berry and Fagerjord, 2017; Bradley, 2019; Van Geenen, 2020). This is not, however, a rigid decision but an idea about which technologies are both at hand and feasible. In one of the documents, we read that ‘Mapping functionality will be provided in the browser using widely adopted libraries such as LeafletJS. Given that the main scope of the project is a historical analysis rather than geographical accuracy, the development approach would focus on representing county boundaries, rather than more granular geopolitical entities’. The open-source JavaScript library has been selected by explicitly considering the main goal of the research project.
The next aspect influencing the selection choice of software is sustainability: ‘Django, an open-source web publishing framework with which KDL has extensive experience and has found to be stable, powerful, and scalable. Django has a proven record of delivering enterprise-level products’. Each Feasibility document highlights how the lab emphasises the delivery of durable research outputs. The report by the Software Sustainability Institute identified the lack of sufficient attention to maintaining and preserving software and digital outputs, which are the consequences of thinking about RSE work as a technical practice rather than an intellectual effort. ‘Software is often seen as a research auxiliary rather than a research output’ (Bergel et al., 2020: 18). It is therefore important to think more critically about technical practices in the context of their durability and preservation.
The sustainability approach is reflected in the lab practices which focus strongly on standards (e.g., ‘While the team did not mention compliance to specific standards, KDL would strongly recommend adopting a bibliographic standard such as FRBR and TEI, either to model the data or as export option’), reusability (e.g., ‘Export functionality for the data to be downloaded and reviewed or used by others with appropriate licensing and the possibility of an API component to make all entities in the database citable and re-usable by other projects’) and integration (‘Places mentioned in the digital edition should be integrated with Wikimedia Commons when relevant to display image’).
Another important aspect considered by the team is a security issue that involves assessing whether software can be exposed publicly and whether its components are up to date. The security risk of software is a major factor reviewed regularly by the team and can lead to a sudden replacement of tools. For instance, Archetype, a digital typography design tool exposed a security risk resulting from the use of Python V2. The tool was planned for use in one research project but then it was promptly upgraded to Python V3 and enhanced by other software to lessen risk. As one Feasibility document reads: ‘Given that Archetype can no longer be exposed publicly (for sustainability and security reasons), public-facing features will have to be rebuilt (or assembled for third-party) to a high level of usability and quality’.
Taking into consideration time, budget, risk and responsibility, the lab has standardised a core technology stack and most projects rely on open-source frameworks and tools, such as Python, Django, Apache SOLR, ElasticSearch, LeafletJS and Neo4J. Tested software and tools are more reliable than experimental, unexplored platforms that need to be carefully evaluated in terms of technological and security issues. As the team argues, ‘We are open to use other technologies as needed to fulfil project requirements; however, if we need to deviate from the core technical stack mentioned above, implications for sustainability should be assessed accordingly’ (KDL, 2021b). The key issue is to keep a balance between innovation and sustainability (Ciula and Smithies, forthcoming). Critical technical practices are therefore about critical reflection on the software choice and development, and the awareness of technological responsibility towards
The lab team and partners
‘Software development is a social process as much as a technical process’, say Helen Sharp et al. in the context of the ethnography of empirical software engineering (2016: 8). This interesting observation has been reflected in KDL’s work. Research software analysis and engineering practices are centred around data modelling, method design and software production. However, to make digital creation feasible and functional from a technological perspective, the process first demands solid, accountable and manageable practices happening at the level of social and organisational relationships. The Agile SDLC model is based on an iterative and incremental approach; that is, the lab team delivers parts of a solution to partners to discuss their further development and improvements, in order to meet partners’ expectations. The iterative process ‘requires being open to changes as the process will likely lead to something completely different from the initial idea, yet a high quality, more functional and usable product aligned with requirements’ (KDL, 2021b). Progress on the project can only be made through successive refinements which in turn rely strongly on social interactions. As we can read in one document: ‘This appears to be a well-rounded project with a PI who is open to working with the KDL SDLC and who would actively engage with KDL as collaborators on an equal footing’.
The relationship between the lab team and partners is shaped by the following three factors: 1) an
In collaborative projects conducted by people representing different disciplines, it is particularly important to ensure that each contribution is visible and credited. Technical practices, such as the RSE work, are perceived as a non-intellectual set of tasks (Lischer-Katz, 2019); therefore, it has become critical for people to learn more about each other’s practices to both challenge this kind of perception and establish collaborations ‘on an equal footing’, as the lab highlights. In another comment, the team noted that ‘the PI understands the KDL SDLC processes and is receptive to our working methodology. He is keen that KDL be brought on board not simply as a service provider but as active contributors to the research’. In the first instance, the Feasibility documents aim to set the right relationship between the lab and partners, and make involved actors aware that digital projects are collaborative practices, rather than service delivery. This is exemplified by tasks that require a collaborative and open attitude from the partners who are assigned to work on some requirements together with the team. For instance, the team provides partners with access to the Wagtail Content Management System to allow them to control website copy and edit project information and metadata.
The Feasibility documents are also intended to ensure that both sides are aware of their responsibilities and expectations. The word ‘responsibility’ appears regularly in the documents and is aimed at explicitly setting up the division of tasks and project development in advance. For example: ‘Note that GDPR compliance, copyright clearance and licence negotiations for any material stored on KDL infrastructure are the responsibility of the PI’; ‘Data accuracy and cleaning as appropriate will be the paramount responsibility of the PI’ and ‘Metadata ingestion in the data repository of choice and data quality checks are the responsibility of the PI’. The line division of accountability mainly goes along with data ingestion and preparation. Although the lab can offer some assistance with data deposit or data organisation, the accuracy and quality of data are the remit of the partners. The relationships between the lab and partners are therefore shaped by the task division which might not always align with the partners’ expectations. From the lab’s perspective, however, it is particularly important to find the right balance between the amount of responsibility and the associated workload and risk.
Conclusion
Digital production is driven by curiosity and creativity, but as projects grow in size and complexity, the risk and responsibility increase. Kirk Woolford et al. (2010: 203), while discussing a critical technical practice, argue that the software craftsperson or artist-engineer undertakes exploratory experiments and investigations with new tools but ‘it is not purely driven by curiosity’: ‘The craftsperson has a commitment to a client, and to the pragmatics of a commissioning situation. It is not ethical to take risks with a client’s money, and deadlines must be honored’. A similar cautious approach to critical technical practices has been observed at KDL. As the lab says, ‘Whilst our passion is to create and develop exciting new tools, we can’t be driven by these appetites alone’ (KDL, 2020a).
The lab’s responsibility is dispersed over several different actors: project partners, cultural institutions and the university. The team deals with many risks of different natures: technical (e.g., mechanical failure, security issues), financial (e.g., budget limitations), methodological (e.g., flaws in data modelling, inaccurate methods) and reputational risks (e.g., poor project management and development can lead to weakening the lab and the university’s reputation). The risk assessment as part of the Feasibility documents is therefore a significant task carried out by the lab to mitigate and protect themselves from any unexpected mishaps. The lab approaches each project cautiously and seriously since from the moment of a project being accepted, the team gives its full commitment to provide a robust and sustainable artefact.
To this end, the lab adapts the Agile DSDM approach to its SDLC framework to offer a carefully structured project workflow. As part of this model, the lab prepares the Feasibility documents that, as has been shown, play a significant role in assessing the methodological, technical and design requirements by considering a broad set of factors. I have sought to show that the Feasibility documents constitute important ethnographic materials which can provide insights into a lab workflow and management process as being a complex and interconnected set of layers and practices. Further, I have conceptualised the documents as
This article has also aimed to offer a methodological framework for the empirical analysis of documents produced in the Digital Humanities within an RSE context. Drawing on the method of the ‘STS of documents’, I intended to show that documents can be studied as ethnographic objects that can help to reveal critical and socio-technical practices entangled with operational methods and local requirements. So far, much less attention has been given to the study of documents produced in DH; however, given a diverse and large number of materials being the outputs of DH work (e.g., feasibility documents, kits, protocols, white papers), it has become timely and significant to approach them to investigate the socio-technical practices of digital research production. The DH can therefore benefit greatly from the use of the STS-based method of reflective discourse on documents. Documents can enable critical empirical analysis of the production of DH outputs, including the investigation of social relationships, task divisions, labour issues, the workplace culture, technical practices and infrastructural values. They are the result of efforts aiming to reach consensus, coordination, and control by considering the nature of personal and organisational accountability. Thinking more deliberately about documents produced in DH can shift attention from end-product digital objects towards the complex process of their creation, which can unpack a range of social, technological and management issues.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 891155.
Disclaimer
The article reflects only the author’s view, and that the Agency is not responsible for any use that may be made of the information it contains.
