Can I believe what I see? Data visualization and trust in the humanities

Abstract

Questions of trust are increasingly important in relation to data and its use. The authors focus on humanities data and its visualization, through analysis of their own recent projects with museums, archives and libraries internationally. Their account connects the specifics of hands-on digital humanities work to larger epistemological questions. They discuss the sources of potential mistrust, and examine how different expectations and assumptions emerge depending on the use and user of the data; they offer a simple schema through which the implications may be traced. It is argued that vital issues of trust can be engaged with through design, which, rather than being conceived as a cosmetic finish, is seen as contributing insights and questions that affect the whole process. The article concludes with recommendations intended to be useful in both theory and practice.

Keywords

Data visualization critical design ethics interdisciplinarity interrogability scepticism GLAM omission bias naming classification certainty precision

Introduction

In the context of visualization, a fundamental question about trust, ‘Can I believe what I see?’ takes on a particular cogency. Data visualization, for us, serves a number of functions including exposition, exploration and analysis. In museums, archives and libraries, the end-user may be a member of the public (a highly varied constituency), curator, educational outreach officer, historian, researcher, administrator, or in some other role. They may be a passive observer of the visualization, or have varying degrees of interactivity and control. We show later that their role has a significant effect on their expectations of trust.

We write at a time of widespread concern with questions of trust in data and computation. We first discuss some of the general issues, then focus on the ways that these questions are manifested in data visualization. We provide examples from our own work with ‘cultural data’ that is now fundamental to the work of cultural institutions. We end with some broad principles that we propose should inform future work in this field. In doing so, our discussion connects the specifics of hands-on digital humanities work to larger epistemological questions that extend beyond the confines of our discipline(s). We write as critical designers, that is, as designers engaged with research and intellectual inquiry as much as with ‘mere’ appearance. We work collaboratively with curators, archivists, librarians and researchers. Our expertise includes software engineering, visual design and interaction design. Most of our work deals not with quantitative but with nominal data, typically involving multiple attributes of discrete entities, and often organized according to time. This brings into play issues around the nature of entities and the precision with which their properties can be defined. While visualization may transmit or even exacerbate problems related to trust, we will suggest that, rightly used, it may also be part of their solution, in which design plays a vital role.

Current issues of trust and data

Data and its use are attracting increasing attention. Recently over 90% of respondents were concerned about the data that companies can collect about them (Microsoft Corporation 2020, 6). Examples have emerged of significant omissions in data, prejudicing policy and decision-making (Williams, Brooks, and Shmargad 2018; Criado-Perez 2019; Favaretto, De Clercq, and Elger 2019; Linder and Svensson 2019). Uncritical use of data derived from past practice will tend to embed and perpetuate discrimination (Žliobaitė 2017; Noble 2018; Obermeyer et al. 2019; Park and Humphry 2019; Babuta and Oswald 2020; Givens 2020). Language datasets perpetuate the human biases captured in the data (Bolukbasi et al. 2016; Caliskan, Bryson, and Narayanan 2017), as do image datasets (Buolamwini and Gebru 2018; Crawford and Paglen 2019; Prabhu and Birhane 2020). Misuse of data may be innocent, but also malicious (Briant 2018; Ward 2018).

Issue of trust in cultural history and its data

Models shaped by past human behaviour are necessarily models of a particular cultural context including its prejudices (Underwood 2018). Cultural institutions have their own particular problems with data and its use, including the very objects they contain: for Dekker (2018), the product of plundering, looting, and unethical practices. Huxtable et al. (2020) document the legacies of slavery and empire in the UK National Trust. Sheppard (2010) shows how Petrie's collecting owed as much to eugenics as it did to Egyptology. With digitization, a range of new problems arise. What constitutes ‘scholarly data’ may be contentious; rich pre-digital metadata may be lost in digitizing records (Setlhabi 2012; Tóth-Czifra 2020, 237); false precision and quantization may arise when uncertain information such as dates is digitized (Kräutli and Boyd Davis 2013); changes in place-names and boundaries can lead to misleading geo-coding (Bouk 2020, 5).

The inherent characteristics of cultural history

An important aspect of cultural history and therefore of Digital Humanities, unlike most of the hard sciences, is that subjectivity, lack of precision, and conflict of opinion are inherent to both the material under study and the processes applied. Various kinds of uncertainty are unavoidable and fundamental to historical work (Nyhan and Flinn 2016; Edmond 2019; Franke et al. 2019). Vagueness ‘plays a crucial role in humanistic models’ (Martin-Rodilla and Gonzalez-Perez 2019). When subjectivity and imprecision of history are defining characteristics, this affects trust in a particular way. Presenting data, the task is seldom to maximize the user's trust in it and its transformations, but rather to reveal the extent to which these may be untrustworthy.

There is increased recognition of the issues involved in representing subjectivity and uncertainty, including visually (Nowviskie 2004; Drucker 2011; Nowviskie et al. 2013). The problems of specious authenticity in digitally ‘reconstructing’ archaeological sites have attracted considerable attention (Strothotte, Masuch, and Isenberg 1999; Schäfer 2018; Lengyel and Toulouse 2020).

Our paper focuses on three specific areas which, in our own experience, are important loci of questions of trust. They involve fundamental epistemological questions: (1) omission and bias, (2) naming and classification, (3) certainty and precision (the three are not entirely discrete). Focusing on our own projects allows us to report specific experiences and insights with some authority. As shown in Table 1, we trace these issues through three phases: (a) in the data itself, (b) as exacerbated by digital transformation, (c) in visualization. The table can be thought of as a pipeline running from top to bottom, in which bias, error, uncertainty and other issues may be propagated – and indeed introduced (Skeels et al. 2009; Schäfer 2018). Processes – beginning typically with existing analogue documents and objects, then capture, data-cleaning, cropping, colour-calibration and other processes, ending with retrieval and rendering in visualizations – must all be regarded as significant transformations. For example, to crop an image is to make a judgement about what is inside the resulting frame and what is now excluded (though it was in the previous state of the image). If this routine is mechanized, this does not absolve us from considering what we are doing, for whose benefit and why, and what is excluded in the process, such as a hand-scribbled note on the mount of a photograph. The end-user may be unaware of what has happened to data behind the scenes, reinforcing the question ‘Can I believe what I see?’ – or in important cases not see.

Table 1

Three broad categories of problem, shown as columns. Each column is populated with indicative examples related to issues of trust, against the three phases identified at the left.

	1. Omission and bias	2. Naming, classification	3. Certainty and precision
a. Data – issues at source	Omission of non-dominant social groups	Variant and duplicate names of entities such as places, individuals	Multiple expressions for dates, eg. before, about, in the reign of
b. Problems aggravated by digitization/computation	Partial or incomplete digitization of source material	Loss of semantic structure in textual records during digitization	Dates quantized, bracketed, corrupted by algorithm
c. Visualization	Items excluded by selection procedures	Presenting entities whose identity/name change over time	Locating objects on a time axis

Problems of trust 1: omission and bias

Omission and bias: data – issues at source

Problems of omission and bias within data show up in cultural history in both familiar and distinctive forms. Moltrup (2019) highlights the under-representation of women in archives of graphic design. Klein (2013) highlights Thomas Jefferson's slave Hemings, who never appears in Jefferson's correspondence as a writer, yet is extensively written about. Klein reveals Hemings’ ‘ghost’ by graphically mapping his every appearance. Agostinho, Dirckinck-Holmfeld, and Søilen (2019) similarly make visible the occluded documents of Denmark's slave trade. While not all omission and bias in cultural data is so political, it still matters greatly to the historian.

Omission and bias: problems aggravated by digitization/computation

When faced with a digital display of search results, how do we know that what we are currently looking at is all there is, all that is relevant? Murphy and Villaespesa (2020) point to museum bias and machine bias combined. Back in 1987, Conklin pointed out the particular difficulties arising from the loss of physical cues in the digital (Conklin 1987, 21). At London's Victoria & Albert Museum, Vane worked on the Royal Photographic Society collection that had recently been transferred from the Science Museum Group and was in the process of being digitized through reproduction and digital cataloguing. Representing the history of the art of photography, it includes over 270,000 photographs. Vane began visualizing the collection early, when only 2% had been digitized. The result was dominated by albumen prints and daguerreotypes simply because they had been digitized first. Isolated clusters elsewhere represented photographs by key figures such as Julia Margaret Cameron. Dialogue with museum staff revealed a convergence of practical reasons and wider institutional factors – particularly the opening of a new photography gallery – behind these biases. A member of staff referred to the particular themes selected as ‘a very V&A … story.’ Clearly, institutional policy and culture leave distinctive ‘fingerprints’ on collections and their digitization – which visualization may uncover. The distribution of items at this stage was unrepresentative of the total collection; it might have been incomprehensible without access to expert insider knowledge. Visualization by a designer was revealing, but what it revealed required the specialized knowledge of curators to explain.

In a second context, Living with Machines, a five-year collaborative project using digital collections and methods to explore lived experiences of industrialization in nineteenth-century Britain (LwM 2019a), decisions had to be made based on understanding the existing digitized newspapers in the British Library collection, and which new digitization to undertake. This involved trying to reconstruct earlier, sometimes opaque, choices behind previous digitization. Hauswedell et al. (2020) point out that there have been few in-depth analyses of the processes and motivations influencing inclusions and exclusions in such digital archives, while Tolfo et al. (2021) cite many other sources on the lack of transparency in digitization policies and the need for ‘paradata’ (Fyfe 2016) to account for past decisions. Online access to over 33 million newspaper pages might imply a representative selection, but is only about 6% of the physical newspaper collection (in turn an unknown percentage of newspapers originally published). Using an external, contextualizing source rather than the collection itself, Tolfo et al. uncovered an unexpected and substantial under-representation of the conservative press.

Omission and bias: visualization

In the project Dive into Color with the Cooper Hewitt Smithsonian Design Museum, Vane developed a visualization of the collection based on colour, at the request of the curators (Vane 2019, 78–99). Colour was seen as offering a visual way to explore a digitized collection for those without specialist knowledge, opening up fashions and innovation in colour technology among other themes, coinciding with a physical exhibition on colour theory and design. Considerable effort was put into devising and testing a user-friendly touchscreen interface and fine-tuning the algorithm that selected the collection objects in response to the user's choices. But behind the scenes lay colour data containing errors, derived from a previous computation of the colours in each digital photograph. This data, unsurprisingly, captured no information on the landmark nature of certain items in the collection and, in fact, just such a moment was missing from the visualization. As one of the curators remarked,

Perkin's mauveine scarf, that is the invention of purple dye in 1856, doesn't show up on here. But we're calling it a major moment in color history [it is the invention of the first synthetic dye] … If there are key points, we want to make sure they aren't missed in this interaction.

On investigation, the key scarf is absent from the visualization because the distinctive colour of this object is not captured in the data. The colours extracted from Cooper Hewitt's collection using RoyGBiv (Parvaneh 2013) are good on the whole, but anomalies occur; background colour is sometimes picked up; the effect of light and shadow on a 3D object can introduce multiple, illusory colours; sometimes extraneous objects such as colour calibration scales and labels are included in the image, skewing the colour profile for the object. A central problem is that the result may not correspond to what a human would be likely to say about the objects depicted, such as ‘this chair is all the same shade of red’ or ‘this lace is simply white.’ On the one hand, this is a notable case of machine processes enabling new research, as the many recent projects based on selection and analysis by colour would be near-impossible without digitized images and the ability to compute the colours within them (Hinchcliffe and Whitelaw 2015). On the other hand, there is clearly an awkward relationship between machine-generated outputs and the curatorial sense of what is historically right. Significantly, the curators characterized the colour data as ‘very unreliable,’ while staff working in Digital at the museum considered that the technique had been very successful. Their trust in the results was based on differing, almost incompatible, criteria. Vane also discussed with curators the fact that she had fine-tuned her algorithms to produce visualizations that looked ‘right’ but they were unconcerned. This is almost certainly because the visualization was intended as an engaging, user-friendly introduction to the history of colour and the collection, and so some ‘fixing’ was acceptable. If the objective had been a system for curatorial or scientific analysis of the collection, quite other standards of trust would have come into play. Indeed a specialist in historical colour interviewed as part of the evaluation of the project was somewhat dismissive of it as a populist exercise.

Problems of trust 2: naming, classification

Our focus on data that is generally nominal or categorical rather than quantitative introduces a particular need to address issues of naming and classification.

Naming, classification: data – issues at source

Libraries, archives and museums may be simplistically conceived as collections of objects, for which they hold data and metadata commonly structured as a catalogue that involves naming, classification and the assignment of various attributes. Naming and classification schemes are institutionally contingent (Hooper Greenhill 1992) and culturally contingent, reflecting the locus of cultural power (Berman 1971; Harris and Clack 1979; Cherry and Mukunda 2015; Duarte and Belarde-Lewis 2015). Cultural entities are not always the well-defined units we might imagine, nor is their identity necessarily easy to establish. Bell and Ranade (2015) tackle the surprising difficulties in ensuring the identity of historic individuals using name, date of birth and other standard attributes through algorithmic entity matching. A particular problem is the mutability of apparently unitary cultural entities. White and Dunleavy (2010, 16) show how, of the roughly 20 UK government departments, only 4 remained broadly unchanged in the period 1979–2009, with most experiencing multiple mergers, splits and renamings. We discuss below how we have tackled the changeable form of a ‘single’ text, the Sphaera of Sacrobosco, and traced the changing identities of historic newspapers. Additionally, of course, few objects can be regarded as simple: they have constituent elements, which themselves may be hard to classify unambiguously. Within an apparently single text, fragments as small as individual words may need a provenance trail with connections both internal and external (Kuster et al. 2011, 317). Valleriani, Kräutli et al. (2019) describe a process of document atomization into texts, illustrations, and tables, that may encounter multivalent objects such as illuminated capitals that function as text parts, decorative elements and illustrations.

Naming, classification: problems aggravated by digitization/computation

In the V&A photography work already discussed, we applied machine intelligence to the question of visual similarity. This was based on the calculation of feature vectors, multi-dimensional numerical representations of images describing their visual characteristics, a technique of increasing interest to support search in cultural image collections (Yale University Library Digital Humanities Lab 2017; Pim 2018). It raises important questions about the trust we place in algorithms in these contexts, and the transparency and comprehensibility of machine intelligence more generally; Ayesha, Hanif, and Talib (2020) discuss how transforming high dimensional to low dimensional data risks losing essential information. In their view, selection of a suitable method according to the type of data remains a key issue to be addressed. Our own similarity computation (deploying the Keras library written in Python) used a pre-trained VGG16 model (Simonyan and Zisserman 2015) previously trained on the ImageNet database. At the time we were unaware of the highly problematic nature of ImageNet's labelling later revealed by Prabhu and Birhane (2020), but fortunately our work only involved the graphic content of the images.

Similarity computation offers natural-seeming and understandable visual paths through a collection in ways that are not possible with cataloguing data alone. Unfortunately, it also produces results of which the user has every right to be suspicious. As with the Cooper Hewitt colour computation discussed above, extraneous objects in the image such as colour charts and rulers inevitably mislead machine calculation of similarity. Similarity is computed that seems obscure or unhelpful to the human user. In one case an image returned by the algorithm turned out to be the back of a historic photograph, whose blotchy surface was deemed similar to a photograph of a grassy field. Some lace was computed to be similar to dewdrops on a leaf. As with the issues discussed above, a key problem is not visibly nonsensical results. The problems are that images that ‘should’ have been returned as similar (by the standards of human judgement) may not be – and the user will be unaware of the omission – with no indication of the machine reasoning available. In this case, even we, as programmers and designers of the visualization, did not have access to the inner workings of the similarity computation process (Figure 1).

Figure 1

Annotated t-SNE plot of the Royal Photographic Society data at the V&A. t-SNE is a form of Stochastic Neighbour Embedding, a nonlinear dimensionality reduction technique. Visual design and coding: Olivia Vane 2019.

As already discussed, questions of trust are highly dependent on both the use and user of such visualizations. If, as in this case, the visual interface is designed to encourage serendipitous exploration by a wide range of users, and seems to offer no guarantees of infallibility or completeness, both the designer and user can perhaps afford to be somewhat careless of trust. But for any kind of serious scholarly work, such an approach would be unacceptable. There is also a risk of raising expectations that cannot be fulfilled. A V&A curator dreamed of searching for all studio photographs containing the same painted background, as an indicator that they were taken at the same studio, but in practice this would almost certainly be unreliable in accuracy and completeness. To a historian, such questions are not incidental but fundamental. In an earlier project, with the Wellcome Library (Vane 2019, 43–57) the question arose of how to manage the display of large result-sets. Historians interviewed had divergent views about the use of relevance ranking to filter the results. For some, it was an acceptable means to control the possible information overload, but others felt that a general algorithm could never anticipate the specificity of their search and would be unhelpful. There was agreement on the need to be explicit about removing any results if a filter has been implemented, and that there ought always to be an option to see all the results if a user wished. Trust and transparency in the processes by which results are returned was very important for these users: ‘when it comes to academic research … what I conclude from your tool feeds into my reputation.’ This echoes previous research in search interface preferences for historians, demonstrating how they value control in aspects of searching and browsing (Crymble 2016).

Naming, classification: visualization

We introduced above the problems associated with the ill-defined nature of entities. In our early visualization of the entire Tate Gallery collection (Kräutli 2016, 145–150), the large disk shown at centre (Figure 2) testifies to the overshadowing dominance of Turner in Tate's collection, accounting for about 40,000 works, or more than half of the entire art collection. It transpires, however, that most of these works are actually individual pages of Turner's sketchbooks. His dominance is, in part, the result of a decision to catalogue every single page as an entity in its own right.

Figure 2

Visualization of the Tate Gallery collection, showing the combined real and misleading dominance of Turner. Algorithmic and visual design: Kräutli 2014.

In our more recent work, we have directly addressed the problem of the mutable cultural object in two projects. In the British Library newspapers project introduced above, a complexity emerged that exemplifies the problem (Tolfo et al. 2021). Newspaper titles undergo incorporations, amalgamations, and name changes through time. For example, The Athletic Reporter in 1886 had three new names before becoming The Coventry Reporter and General Advertiser in 1890. The British Library dataset treats each as a new and separate title. Fortunately, the connections between titles are recorded under two facets: ‘preceding title’ and ‘succeeding title’, though the nature of the connection, eg. amalgamation, is not made clear. For many purposes, these connections needed to be apparent in a visualization. The Press Picker (Figure 3) reunites discrete newspaper titles and indicates bound volume and microfilm holdings of each over time. It has proved a valuable tool within the project, making previously inaccessible structures visible.

Figure 3

Press Picker visualization, reuniting discrete newspaper titles and indicating bound volume (black/dashed line) and microfilm (red line) holdings over time. Connected titles are brought together with a branching design at the left of the line graphs. Algorithmic and visual design: Olivia Vane, Kasra Hosseini and Giorgia Tolfo 2020. Data: British Library 2019.

The project Sphaera: Knowledge System Evolution and the Shared Scientific Identity of Europe (sphaera.mpiwg-berlin.mpg.de) is investigating how scientific knowledge evolved during the early modern period (Valleriani et al. 2019). It traces the history of a specific treatise around which a corpus of other sources accumulated, the Tractatus de Sphaera of Johannes de Sacrobosco, in 359 different editions printed in 41 European cities between 1472 and 1650. The corpus is collected in a database, CorpusTracer, using CIDOC-CRM to represent ‘text parts’ (see Kräutli and Valleriani 2018 for definition and discussion), such as the Theoricae novae planetarum of Georg von Peuerbach which began being printed together with the Sphaera as early as 1482, or much shorter text parts – ‘original part’ and ‘adaption’ including ‘annotation’ and ‘translation.’ A total of 563 text parts were identified, of which 239 were considered important because they reappeared more than once in different years, in a total of 1,653 appearances. Thus while the Sphaera might loosely be considered as a unitary ‘book’ in several editions, the original treatise becomes a label for the field of geocentric cosmology comprising a multitude of treatises (Zamani et al. 2020). The underlying linked-data structure, assisted by visualization (such as Figure 4), allows access to a sophisticated model of its internal and external relations.

Figure 4

The Sphaera corpus presented as time-slices. The red dots are nodes that represent books, linked by the blue arcs. Algorithmic and visual design: Valleriani et al. 2019.

Apparently, simple classification masks a wealth of difficulty over terms such as content illustration, frontispiece, printer's mark, title page, initials, title page illustration, even the term page itself. The nature of the object, and its relation to the digital object, is eternally problematic.

A recent state of the Sphaera data model is reproduced here (Figure 5). It captures the bibliographic data of each treatise as well as data on the individual texts they contain. It continues to be adapted and extended in response to new findings. There is a clear tension here between the honesty and the complexity of such a diagram. While it captures explicitly and visibly each relation between facets of knowledge, it may be intimidating to many humanities scholars, who by default are instead presented with conventional data-entry screens rather than the underlying map. The project raises important questions about the interrogability of data-structures that are inevitably also records of curatorial and scholarly decisions. At least here the model is accessible for those who wish to explore it.

Figure 5

The CIDOC-CRM data model used in the Sphaera project. It captures the bibliographic data of each treatise as well as data on the individual texts they contain. Model architecture: Kräutli and Valleriani 2018.

Problems of trust 3: certainty and precision

We have discussed how uncertainty and imprecision are inherent characteristics of most humanities data and processes. There have been many useful definitions and taxonomies of uncertainty (Pham, Streit, and Brown 2009; Skeels et al. 2009; Schäfer 2018; Therón Sánchez et al. 2019; Windhager et al. 2019). Distinctive humanities sources of uncertainty include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages (Won, Murrieta-Flores, and Martins 2018; Smith and Cordell 2019), unfaithful digitizations of artists’ colours (Malis 2020), and the inherent (non)equivalence of translation (Franke et al. 2019). Cottrell (2017) found ‘circa’ with dates in 11 cultural collections, from 0.6% of records (Wellcome Library) to 67.8% (State Hermitage Museum) while Kräutli (2016, 77) found more than 60 expressions used to express imprecision of date in Victoria & Albert Museum collections data. Burgess (2016, 89) found that provenance information with uncertain or incomplete times was rarely able to be adequately captured.

Van Ruymbeke, Hallot, and Billen (2017) extend CIDOC-CRM to represent conflicting opinions about a historical object. Moncla et al. (2019) discuss an approach to the problematic geolocation of place names in the great eighteenth-century Encyclopédie ([1766] 2017) using qualitative relative locations rather than coordinates. But digital interventions also introduce new problems, especially when responding to conflicting or imprecise sources.

Certainty and precision: data – issues at source

Among the many issues of inherent imprecision, contestability and other uncertainties within the (digital) humanities, we focus now on a single problem that we have investigated in some depth: that of dates. We discussed above the longitudinal history of a ‘single’ publication with its mutable instances of the apparently simple object over time. Even records for apparently unitary objects may have multiple dates referring to the item's history: dates of production, accession, cataloguing, exploitation such as loan or exhibition (Kräutli's time maps are unusual in capturing such events [Kräutli 2016, 215–217]). In Swedish Open Cultural Heritage data, Vane (2019, 21) found multiple dates even for an item's ‘production.’ Historic buildings may have production dates for rebuilding, extensions, remodelling, etc. – a church had 46 such dates. A photograph may have a different date for when it was taken, and when it was printed, and more. In visualization, which production date (or how many dates) could and should be represented? Even a printed date of publication may conceal unexpected invitations to mistrust. The National Library of Scotland database of historic Ordnance Survey maps (https://maps.nls.uk/os/; NLS Maps 2020), notes early sheets reprinted with updated information added (for example adding in a new railway), but with no change to the printed publication date, and 1930s and 1940s maps where the printed ‘publication’ date is after the date of ‘revision.’

Certainty and precision: problems aggravated by digitization/computation

When turning data into numbers for visualization, decisions must often be made that could be evaded when using words or quasi-numeric expressions for approximation. Windhager et al. (2018), in a survey of 70 visual interfaces to cultural history, found 60 that deployed temporal ordering; this forces the designer to make choices about dates and precision. Both Kräutli (2016) and Vane (2019) have worked extensively with these problems, revealing that, in any but the simplest cases, there is no ‘right’ solution, especially in the typical situation where dates in a dataset are specified with widely ranging levels of precision. As Rocha Souza et al. (2019, 21) discovered, to locate full-century dates, such as ‘C18,’ as though they belong to the mid-century year, produces potentially misleading quantized spikes at those dates. Assigning them a random date in the century is no more satisfactory, since it invites the user to make perhaps wrong inferences about sequence, and even cause and effect. In the Cooper Hewitt visualizations discussed earlier, Vane attempted to overcome this problem by distributing the icons – miniature representations of the museum objects themselves – within chronographic displays in a dithered pattern, aiming to discourage the user from forming uncalled-for quantitative or sequential judgements. Arguably this satisfies the standards of trust appropriate to a browsing member of the public but is quite unsuitable for scholarly work.

Estimation and precision: visualization

Visualization has been long recognized as dangerously persuasive. Priestley, an originator of modern visualization, identified its deceptive power (Priestley 1764, 7–8) as have many others more recently (Tufte 1983; Wainer 1997; Cairo 2019; Kosminsky et al. 2019). Priestley realized that his invention of a single line to represent each life within a biographical timeline (Priestley 1765) invited the user to imagine that the start and end of each life were known with precision, so he pioneered the visual expression of uncertainty, grading it into five levels using varying quantities of dots at the ends of his lines (Boyd Davis and Kräutli 2015). Despite this early intervention, there has been little empirical research on the effectiveness of particular graphic approaches in representing uncertainty (MacEachren et al. 2012).

Windhager et al. (2019) specifically address the visualization of uncertainty in relation to cultural data, and point out how issues of trust will be different for different kinds of users. They discuss two kinds of end-users, but, as we have noted, there are important differences also within organizations, such as the difference between outreach staff and curatorial staff, and in the purposes of particular visualizations. They present the representation of uncertainty as a key strength of proper scholarship, implicating questions of trust, where representing uncertainty ‘values veracity, rigor, and truthfulness above all else’ eschewing ‘prettified or euphemized representations.’ Franke et al. (2019) also emphasize how confidence lies in the relation between the evidence and the historian, whose expertise is a strong influence.

There is widespread concern that the techniques to represent data vagueness emerged from the hard sciences, and lack the expressiveness required for humanistic contexts, a difficulty that ‘only increases when we try to implement these models as software systems to organize, query, annotate, or search data and assist in the generation of new knowledge’ (Martin-Rodilla and Gonzalez-Perez 2019). Bowman (2019) sees standard graphic conventions such as error bars as counterintuitive in their hardness, recommending tone as a better visual analogy. He emphasizes that the graphic image is an intermediary between the underlying data and the user. Kräutli similarly found that his format to model uncertainties graphically (Kräutli and Davis 2013) risked implying greater confidence about the degree of uncertainty than was supported by the data. D'Ignazio and Klein (2020, 90–92) suggest digital graphics should enact, not depict, uncertainty, for example using movement. Tackling subjectivity, Tateosian, Glatz, and Shukunobe (2020) create maps that show, not only the dispositions of the troops at Gettysburg, but participants’ sometimes fatally erroneous perceptions of those dispositions.

Conclusions and recommendations

We have discussed three components of cultural historical visualization and the broader digital humanities, that have implications for trust – omission and bias, naming and classification, certainty and precision – and shown these manifested in the source data, in digital transformations, and in visualization. We now distil these issues into four recommendations, focusing on the need for: principles and policies within organizations; interdisciplinary working and knowledge sharing; interrogability of data, processes and systems; and an ethical commitment to eliciting scepticism.

Principles and policies are needed in organizations

The Ada Lovelace Institute (2020) calls for issues in data projects to be assessed before implementation, and for their nature and impact to be analysed afterwards. Cultural organizations increasingly recognize that they must account for their past and present collection policies (Gazi 2014; Kidd 2017; Giblin, Ramos, and Grout 2019), and their role in broader society (Sandahl 2019); they have similar responsibilities in relation to data (Chilcott 2019; Wright 2019). Lin et al. (2020) expect repositories to earn the trust of the communities they serve, including by conforming to standards. Poletti and Gray (2019) see data as questioning, not just as representing: digital scholarship and visualization take on a critical role, with implications for the design of the tools and processes employed. For Burgess (2016, 83) the apparently familiar concept of provenance is as much about subsequently created (meta)data as the original object. Tolfo et al. (2021) similarly demand early and thorough documentation of digitization processes, together with institutional capacity to share this information publicly. This has implications for our next recommendation.

Interdisciplinary working, including designers, is a benefit

We have noted on several occasions the limits on making sense of data, including through visualization, without contextual knowledge. Tóth-Czifra (2020) emphasizes the loss of essential expert interpretation when records are digitized: personal decisions, inevitably biased because human and contingent, foreground certain information while other knowledge risks loss by being tacit. Ruis and Shaffer (2017) also demand engagement with the source material and the context in which it was produced. In technical processes such as dimensionality reduction or network analysis, human collaboration with computation is often needed to achieve accuracy (Ayesha, Hanif, and Talib 2020) or to validate the underlying model (Ruis and Shaffer 2017). D'Ignazio and Klein (2016) similarly express concern at the misinterpretation of data once it is isolated from its context.

Visualization, and the involvement of designers, cannot be regarded as a simple, one-way process that receives data, organizes and displays it – as our vertical pipeline in Table 1 might suggest. Kandel et al. (2011) emphasize early-stage use of visual tools that integrate verification, transformation, and visualization to discover problems. This can only be effective if visualization is integrated into the interdisciplinary discourse at the heart of projects. Just as there is no neutral data, there is no neutral visualization: the intention must therefore be based on a shared understanding of the objectives across disciplines (Vane 2019, 126). The power of science-humanities collaborations is increasingly recognized (Williams 2019; LwM 2019b). The first of Kusner and Loftus (2020, 35) five measures against bias is interdisciplinary collaboration, including between data-scientists and humanists, to better understand the context of the data used to train the algorithms. For Underwood (2018) we need both mathematical inquiry about culture, and cultural criticism of the mathematical models used. Trček (2009) points out how technical questions in computer science increasingly require an understanding of the temporal, subjective and qualitative nature of trust. But as Griffin and Hayler (2018) argue, challenges then arise including competing disciplinary norms, actual and perceived inequalities within knowledge production, and the denigration of certain kinds of expertise, both during projects and when they are reported as research.

We have discussed elsewhere the varied relationships of design to knowledge, research and critical practice (Boyd Davis and Vane 2020). Dörk et al. (2013) emphasize the need to ask how values and intentions shape visualization practice and how visualization can influence, manipulate, and empower, in a critical approach that is reflexive about the tools, methodologies, and theoretical frameworks it employs. While data visualization has been taken to imply a distanced perspective that provides only summative overviews, it is increasingly able to support close reading of individual objects, challenging the traditional contrast between overview and detail (Junginger et al. 2020). To do this, designers will be obliged to engage with their subject-matter rather than ‘simply’ present data that is presented to them. Indeed, the whole question of the extent to which data alone, even visualized, can articulate meaning about collections is an open one (Boyd Davis, Vane, and Kräutli 2016). Our accounts of projects above show how useful design can be to other disciplines – as an interrogative, not a decorative practice – but also how much designers need to learn about the materials, projects, objectives, histories, cultures and other aspects of the collections and institutions they work with in order to make their fullest contribution.

The need for interrogability

Discussing big data, O'Neil (2016, 8) asks ‘How do you justify evaluating people by a measure for which you are unable to provide explanation?’ There is increasing concern at the lack of interrogability of machine intelligence. While the earliest systems were accessible to interpretation, this is no longer true of opaque decision systems, ultimately affecting people's lives, such as deep neural networks. There is a need for access to how they are generated (Barredo Arrieta et al. 2020). For Babuta and Oswald (2020, xii), human-interpretable features are essential to provide transparency. Smilkov et al. (2017) use interactive visualization to make the workings of machine learning systems more accessible. While current attention is on machine intelligence, we propose that interrogability should be the goal of all digital systems, including their institutional contexts. International bodies increasingly expect accountability and transparency in relation to personal data (eg. the EU's GDPR) and the same should be true of archival data (Goodman 2016). For the Ada Lovelace Institute (2020), audit systems should: aim to make data as trustworthy as possible; indicate where data or its representation should be treated with suspicion; provide users with the means to interrogate data and algorithms. Academic researchers and third-party investigators should probe and test data and its representation. D'Ignazio and Klein (2016) propose design process questions – can the team work backwards from given data to document provenance, and talk to the data owners? what are the roles and responsibilities of the team? – and design output questions: can a metadata visualization be provided that shows the provenance of the data and those responsible at each step? D'Ignazio (2017) advocates ‘data biographies,’ contextualising meta-data that captures the origins of datasets and their elements. Sacha et al. (2016) note how analysts themselves may be unaware of uncertainties in their data sources or of undeclared pre-processing, and may ignore the opacity of process within visual analytics systems. Kleinberg et al. (2018) point to a perhaps ironic advantage of algorithms that they formally codify knowledge and practice, in place of the ambiguity of human decision-making, thus potentially facilitating transparency. Edwards and Veale (2017, 81) point out that machine learning explanations are conditioned by the type of user: any explanation needs to be usable by its audience. Developing a visualization for recommender-system results that reveals some of the origins of its decisions, Verbert et al. (2016) found that users, as one might hope, place greater trust in explained than in unexplained results.

An ethical commitment to eliciting scepticism

Our final recommendation expresses an important epistemological position. At many points above we have noted how questions of trust are dependent on both the use and user of a visualization. We discussed the increasing interest in representing uncertainty of many kinds, including subjectivity and conflict of opinion. Lengyel and Toulouse (2020, 50) emphasize the ethical responsibility to communicate uncertainty through visualization: not just particular uncertainties but that archaeology (in their case) is fundamentally uncertain. Importantly, they are committed, not just to such honesty when serving experts: it is also their concern as scientists to convey uncertainty as a fundamental part of their discipline to the public. Elsewhere we have described (Boyd Davis and Kräutli 2015) the temptation, when designing a visualization for public exhibition, to tidy up the display, correct supposed errors and even omit data in order to create a clear picture that communicates a coherent history. Such tidiness, even for public consumption, risks presenting not only a deceptive view of historical events, but also of the nature of historical knowledge itself. As we have already stated, the task is seldom to increase trust in the data and its transformation. On the contrary, the task is usually both to accept and to reveal the extent to which they may be untrustworthy.

Footnotes

Acknowledgements

We gratefully acknowledge the collaborations with our colleagues at our own institutions and at our partner organizations. We thank the reviewers of the draft of this article for their valuable suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Stephen Boyd Davis is Professor of Design Research at the Royal College of Art. He researches the history, theory and practice of visualization and the early history of Design Research. He evaluates research on behalf of government agencies internationally.

Olivia Vane is Research Software Engineer specialising in data visualization on the Living with Machines project at the British Library and the Alan Turing Institute, London.

Florian Kräutli is Knowledge Graph Engineer & Digital Humanities Specialist at the University of Zurich. He was formerly Research Technology Officer at the Max Planck Institute for the History of Science, Berlin

References

Ada Lovelace Institute . 2020. Examining the Black Box: Tools for Assessing Algorithmic Systems . London: Ada Lovelace Institute.

Agostinho

Daniela

, Dirckinck-Holmfeld

Katrine

, and Søilen

Karen Louise Grova

. 2019. “Archives That Matter: Infrastructures for Sharing Unshared Histories. ” Special Issue of Nordisk Tidsskrift for Informationsvidenskab og Kulturformidling 8 (2): 1–18.

Ayesha

Shaeela

, Hanif

Muhammad Kashif

, and Talib

Ramzan

. 2020. “Overview and Comparative Study of Dimensionality Reduction Techniques for High Dimensional Data. ” Information Fusion 59: 44–58.

Babuta

Alexander

, and Oswald

Marion

. 2020. Data Analytics and Algorithms in Policing in England and Wales: Towards A New Policy Framework . London: Royal United Services Institute.

Barredo Arrieta

Alejandro

, Díaz-Rodríguez

Natalia

, Del Ser

Javier

, Bennetot

Adrien

, Tabik

Siham

, Barbado

Alberto

, Garcia

Salvador

, 2020. “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI. ” Information Fusion 58: 82–115.

Bell

Mark

, and Ranade

Sonia

. 2015. “ Traces Through Time: A Case-Study of Applying Statistical Methods to Refine Algorithms for Linking Biographical Data. ” In Proceedings of Biographical Data in a Digital World 2015 (BD 2015) , edited by ter Braake

, Fokkens

, Sluijter

, Declerck

, and Wandl Vogt

, 24–32. Amsterdam, 9 April.

Berman

Sanford.

1971. Prejudices and Antipathies: Tract on the LC Subject Heads Concerning People . Metuchen, NJ: Scarecrow.

Bolukbasi

Tolga

, Chang

Kai-Wei

, Zou

James Y.

, Saligrama

Venkatesh

, and Kalai

Adam T.

. 2016. “ Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. ” In Advances in Neural Information Processing Systems, Barcelona, Spain, 5-10 December 2016 , edited by Lee

D. D.

, Sugiyama

, Luxburg

U. V.

, Guyon

, and Garnett

, 29, 4349–4357. Curran Associates.

Bouk

Dan.

2020. “Error, Uncertainty, and the Shifting Ground of Census Data. ” Harvard Data Science Review , doi 10.1162/99608f92.962cb309. (Accessed 8 July 2020).

10.

Bowman

Adrian W.

2019. “Graphics for Uncertainty. ” Journal of the Royal Statistical Society A 182 (Part 2): 403–418.

11.

Boyd Davis

Stephen

, and Kräutli

Florian

. 2015. “The Idea and Image of Historical Time: Interactions Between Design and Digital Humanities. ” Visible Language 49 (3):Special issue ‘Critical Making: Design and the Digital Humanities’ 100–119.

12.

Boyd Davis

Stephen

, and Vane

Olivia

. 2020. “Design as Externalization: Enabling Research. ” Information Design Journal 25 (1): 28–42.

13.

Boyd Davis

Stephen

, Vane

Olivia

, and Kräutli

Florian

. 2016. “ Using Data Visualisation to Tell Stories About Collections. ” In Proc. Electronic Visualisation and the Arts , 221–228. London. 12–14 July 2016.

14.

Briant

Emma L.

2018. “Explanatory Essays Giving Context and Analysis to Submitted Evidence.” In Research on Leave.EU and Cambridge Analytica. UK Parliament. Accessed 13 June 2020. https://www.parliament.uk/business/committees/committees-a-z/commons-select/digital-culture-media-and-sport-committee/news/fake-news-briant-evidence-17-19/.

15.

British Library . 2019. ‘British and Irish Newspapers’ by British Library Contemporary British and British Library Collections Metadata. Accessed 09/03/2020. https://bl.iro.bl.uk/work/7da47fac-a759-49e2-a95a-26d49004eba8.

16.

Buolamwini

Joy

, and Gebru

Timnit

. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. ” Proceedings of Machine Learning Research (Conference on Fairness, Accountability, and Transparency 81: 1–15.

17.

Burgess

Lucie C.

2016. “ Provenance in Digital Libraries: Source, Context, Value and Trust. ” In Building Trust in Information , edited by Lemieux

Victoria L.

, 81–91. Switzerland: Springer.

18.

Cairo

Alberto.

2019. How Charts Lie: Getting Smarter About Visual Information . New York: Norton.

19.

Caliskan

Aylin

, Bryson

Joanna J.

, and Narayanan

Arvind

. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-Like Biases. ” Science 356 (6334): 183–186.

20.

Cherry

Alissa

, and Mukunda

Keshav

. 2015. “A Case Study in Indigenous Classification: Revisiting and Reviving the Brian Deer Scheme. ” Cataloging & Classification Quarterly 53: 548–567.

21.

Chilcott

Alicia.

2019. “Towards Protocols for Describing Racially Offensive Language in UK Public Archives. ” Archival Science 19 (4): 359–376.

22.

Conklin

Jeffrey.

1987. A Survey of Hypertext. Austin, TX: Microelectronics and Computer Technology Corporation.

23.

Cottrell

Sam.

2017. “ Understanding Textual Uncertainty in Dates Using Interactive Timelines. ” In Proc. Electronic Visualisation and the Arts (EVA 2017) , edited by Bowen

Jonathan P.

, Lambert

Nick

, and Diprose

Graham

, 68–73. London: British Computer Society, 11–13 July 2017.

24.

Crawford

Kate

, and Paglen

Trevor

. 2019. Excavating AI: The Politics of Training Sets for Machine Learning. Accessed 15 June 2020. https://excavating.ai.

25.

Criado-Perez

Caroline.

2019. Invisible Women: Exposing Data Bias in a World Designed for Men . London: Chatto & Windus.

26.

Crymble

Adam.

2016. “Digital Library Search Preferences Amongst Historians and Genealogists: British History Online User Survey. ” Digital Humanities Quarterly 10 (4). Accessed 21 January 2021. http://www.digitalhumanities.org/dhq/vol/10/4/000270/000270.html.

27.

Dekker

Jennifer L.

2018. “Challenging the “Love of Possessions”: Repatriation of Sacred Objects in the United States and Canada. ” Collections: A Journal for Museum and Archives Professionals 14 (1): 37–62.

28.

D'Ignazio

Catherine.

2017. “Creative Data Literacy: Bridging the Gap Between the Data-Haves and Data-Have Nots. ” Information Design Journal 23 (1): 6–18.

29.

D'Ignazio

Catherine

, and Klein

Lauren F.

. 2016. “Feminist Data Visualization.” Workshop on Visualization for the Digital Humanities (VIS4DH) . Baltimore, MD: IEEE.

30.

D'Ignazio

Catherine

, and Klein

Lauren F.

. 2020. Data Feminism . Cambridge, MA: MIT Press.

31.

Dörk

Marian

, Feng

Patrick

, Collins

Christopher

, and Carpendale

Sheelagh

. 2013. “ Critical InfoVis: Exploring the Politics of Visualization . ” In CHI 13 Extended Abstracts on Human Factors in Computing Systems (CHI EA 13). Association for Computing Machinery. 2189–2198.

32.

Drucker

Joanna.

2011. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly 5(1) Accessed 27 July 2020. http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html.

33.

Duarte

Marisa Elena

, and Belarde-Lewis

Miranda

. 2015. “Imagining: Creating Spaces for Indigenous Ontologies. ” Cataloging & Classification Quarterly 53 (5–6): 677–702.

34.

Edmond

Jennifer.

2019. “Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History. ” Informatics 6 (3): 36.

35.

Edwards

Lilian

, and Veale

Michael

. 2017. “Slave to the Algorithm? Why a “Right to an Explanation” is Probably Not the Remedy You are Looking For. ” Duke Law & Technology Review 16 (1): 18–84.

36.

Encyclopédie, ou Dictionnaire Raisonné des Sciences, des Arts et des Métiers, etc . [1766] 2017. Edited by Denis Diderot and Jean le Rond d'Alembert. University of Chicago: ARTFL Encyclopédie Project (Autumn 2017 Edition), Accessed 21 June 2020. http://encyclopedie.uchicago.edu/.

37.

Favaretto

Maddalena

, De Clercq

Eva

, and Elger

Bernice Simone

. 2019. “Big Data and Discrimination: Perils, Promises and Solutions: A Systematic Review. ” Journal of Big Data 6 (1): 12.

38.

Franke

Max

, Barczok

Ralph

, Koch

Steffen

, and Weltecke

Dorothea

. 2019. “Confidence as First-class Attribute in Digital Humanities Data.” 4th IEEE Workshop on Visualization for the Digital Humanities (VIS4DH) 2019, Vancouver.

39.

Fyfe

Paul.

2016. “An Archaeology of Victorian Newspapers. ” Victorian Periodicals Review 49 (4): 546–577.

40.

Gazi

Andromache.

2014. “Exhibition Ethics - An Overview of Major Issues. ” Journal of Conservation and Museum Studies 12 (1): 1–10. Article 4.

41.

Giblin

John

, Ramos

Imma

, and Grout

Nikki

. 2019. “Dismantling the Master's House. ” Third Text 33 (4-5): 471–486.

42.

Givens

Alexandra Reeve.

2020. “Algorithmic Bias Hurts People with Disabilities, Too.” Slate. Accessed 16 June 2020. https://slate.com/technology/2020/02/algorithmic-bias-people-with-disabilities.html.

43.

Goodman

Bryce W.

2016. “ A Step Towards Accountable Algorithms? Algorithmic Discrimination and the European Union General Data Protection . ” In 29th Conference on Neural Information Processing Systems (NIPS 2016). Barcelona. [n.p.]

44.

Griffin

Gabrielle

, and Hayler

Matthew

. 2018. “Collaboration in Digital Humanities Research – Persisting Silences. ” Digital Humanities Quarterly 12 (1): n.p.

45.

Harris

Jessica Milstead

, and Clack

Doris

. 1979. “Treatment of People and Peoples in Subject Analysis. ” Library Resources and Technical Services 23 (4): 374–390.

46.

Hauswedell

Tessa

, Nyhan

Julianne

, Beals

M. H.

, Terras

Melissa

, and Bell

Emily

. 2020. “Of Global Reach yet of Situated Contexts: an Examination of the Implicit and Explicit Selection Criteria That Shape Digital Archives of Historical Newspapers. ” Archival Science 20 (2): 139–165.

47.

Hinchcliffe

Geoff

, and Whitelaw

Mitchell

. 2015. “Colouring Digital Collections: Challenges and Opportunities for the Use of Colour Metadata in Cultural Collections.” Museums and the Web Asia 2015 (MWA2015). Accessed 30 December 2020. https://mwa2015.museumsandtheweb.com/paper/colouring-digital-collections-challenges-and-opportunities-for-the-use-of-colour-metadata-in-cultural-collections/.

48.

Hooper Greenhill

Eilean.

1992. Museums and the Shaping of Knowledge . London: Routledge.

49.

Huxtable

Sally-Anne

, Fowler

Corinne

, Kefalas

Christo

, and Slocombe

Emma

. 2020. Interim Report on the Connections Between Colonialism and Properties now in the Care of the National Trust, Including Links with Historic Slavery . Swindon: National Trust.

50.

Junginger

Pauline

, Ostendorf

Dennis

, Vissirini

Barbara Avila

, Voloshina

Anastasia

, Hausmann

Timo

, Kreiseler

Sarah

, and Dörk

Marian

. 2020. “The Close-up Cloud: Visualizing Details of Image Collections in Dynamic Overviews. ” International Journal for Digital Art History 5: 6.2–6.13.

51.

Kandel

Sean

, Heer

Jeffrey

, Plaisant

Catherine

, Kennedy

Jessie

, van Ham

Frank

, Riche

Nathalie Henry

, Weaver

Chris

, Lee

Bongshin

, Brodbeck

Dominique

, and Buono

Paolo

. 2011. “Research Directions in Data Wrangling: Visualizations and Transformations for Usable and Credible Data. ” Information Visualization 10 (4): 271–288.

52.

Kidd

Jenny.

2017. “Debating Contemporary Museum Ethics: Reporting Sekhemka. ” International Journal of Heritage Studies 23 (6): 493–505.

53.

Klein

Lauren F.

2013. “The Image of Absence: Archival Silence, Data Visualization, and James Hemings. ” American Literature 85 (4): 661–688.

54.

Kleinberg

Jon

, Ludwig

Jens

, Mullainathan

Sendhil

, and Sunstein

Cass R.

. 2018. “Discrimination in the Age of Algorithms. ” Journal of Legal Analysis 10: 113–174.

55.

Kosminsky

Doris

, Walny

Jagoda

, Vermeulen

, Knudsen

Søren

, Willett

Wesley

, and Carpendale

Sheelagh

. 2019. “Belief at First Sight: Data Visualization and the Rationalization of Seeing. ” Information Design Journal 25 (1): 43–55.

56.

Kräutli

Florian.

2016. Visualising Cultural Data: Exploring Digital Collections Through Timeline Visualisations. PhD Thesis. Royal College of Art. Accessed 30 July 2020. https://researchonline.rca.ac.uk/1774/.

57.

Kräutli

Florian

, and Valleriani

Matteo

. 2018. “CorpusTracer: A CIDOC Database for Tracing Knowledge Networks. ” Digital Scholarship in the Humanities 33 (2): 336–346.

58.

Kräutli

Florian

, and Davis

Stephen Boyd

. 2013. “ Known Unknowns: Representing Uncertainty in Historical Time. ” In Proc. Electronic Visualisation and the Arts 2013 , edited by Cowan

Benjamin R.

, Bowers

Chris P.

, and Beale

Russell

, 61–68. London: British Computer Society.

59.

Kusner

Matt J.

, and Loftus

Joshua R.

. 2020. “The Long Road to Fairer Algorithms: Build Models That Identify and Mitigate the Causes of Discrimination. ” Nature 578: 34–36.

60.

Kuster

Marc W.

, Ludwig

Christoph

, Al-Hajj

Yahya

, and Selig

Thomas

. 2011. “TextGrid Provenance Tools for Digital Humanities Ecosystems. ” 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011), 31 May–3 June 2011, Daejeon, Korea. 317–323

61.

Lengyel

Dominik

, and Toulouse

Catherine

. 2020. “Artificial Imagination Induced by Visualised Hypotheses in Archaeology.” Proc. Electronic Visualisation and the Arts 2020, London. 50–57.

62.

Lin

Dawei

, Crabtree

Jonathan

, Dillo

Ingrid

, Downs

Robert R.

, Edmunds

Rorie

, Giaretta

David

, De Giusti

Marisa

, 2020. “The TRUST Principles for Digital Repositories. ” Scientific Data 7 (144). Accessed 21 January 2021.https://doi.org/10.1038/s41597-020-0486-7.

63.

Linder

Astrid

, and Svensson

M. Y.

. 2019. “Road Safety: the Average Male as a Norm in Vehicle Occupant Crash Safety Assessment. ” Interdisciplinary Science Reviews 44 (2): 140–153.

64.

LwM . 2019a. “Living with Machines: Rethinking Research to Illuminate Our Past.” Living with Machines. Accessed 28 July 2020. http://livingwithmachines.ac.uk/.

65.

LwM . 2019b. “Living with Machines: Understanding History with New Eyes”. Living with Machines. Accessed 31 July 2020. http://livingwithmachines.ac.uk/about/.

66.

MacEachren

Alan M.

, Roth

Robert E.

, O'Brien

James

, Li

Bonan

, Swingley

Derek

, and Gahegan

Mark

. 2012. “Visual Semiotics & Uncertainty Visualization: An Empirical Study. ” IEEE Transactions on Visualization and Computer Graphics 18 (12): 2496–2505.

67.

Malis

Jon.

2020. “Accurate Representations of Digital Copyslides: The Colour of Yves Klein.” Proc. Electronic Visualisation and the Arts 2020, London. 75–77.

68.

Martin-Rodilla

Patricia

, and Gonzalez-Perez

Cesar

. 2019. “Conceptualization and Non-Relational Implementation of Ontological and Epistemic Vagueness of Information in Digital Humanities. ” Informatics 6 (2): 20.

69.

Microsoft Corporation . 2020. In Brands We Trust: the Intersection of Privacy and Trust in the Age of the Empowered Consumer . Redmond, WA: Microsoft Corporation.

70.

Moltrup

Megan.

2019. “Herstory of Graphic Design: Elaine Lustig Cohen. ” Collections: A Journal for Museum and Archives Professionals 15 (2-3): 167–177.

71.

Moncla

Ludovic

, McDonough

Katherine

, Vigier

Denis

, Joliveau

Thierry

, and Brenon

Alice

. 2019. “Toponym Disambiguation in Historical Documents Using Network Analysis of Qualitative Relationships.” In Proc. 3rd ACM SIGSPATIAL (GeoHumanities 19), Article 3.

72.

Murphy

Oonagh

, and Villaespesa

Elena

. 2020. AI: A Museum Planning Toolkit. Goldsmiths, University of London.

73.

NLS Maps . 2020. Ordnance Survey Maps – Six-inch England and Wales, 1842-1952. Accessed 30 July 2020. https://maps.nls.uk/os/6inch-england-and-wales/info1.html.

74.

Noble

Safiya U.

2018. Algorithms of Oppression: How Search Engines Reinforce Racism . New York: New York University Press.

75.

Nowviskie

Bethany.

2004. “ Speculative Computing: Instruments for Interpretive Scholarship . ” (PhD Dissertation ). Department of English, University of Virginia. , Accessed 29 July 2020. http://nowviskie.org/dissertation.pdf.

76.

Nowviskie

Bethany

, McClure

David

, Graham

Wayne

, Soroka

Adam

, Boggs

Jeremy

, and Rochester

Eric

. 2013. “Geo-temporal Interpretation of Archival Collections with Neatline. ” Literary & Linguistic Computing 28 (4): 692–699.

77.

Nyhan

Julianne

, and Flinn

Andrew

. 2016. Computation and the Humanities: Towards an Oral History of Digital Humanities . Cham: Springer.

78.

Obermeyer

Ziad

, Powers

Brian

, Vogeli

Christine

, and Mullainathan

Sendhil

. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. ” Science 366 (6464): 447–453.

79.

O'Neil

Cathy.

2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy . New York: Penguin.

80.

Park

Sora

, and Humphry

Justine

. 2019. “Exclusion by Design: Intersections of Social, Digital and Data Exclusion. ” Information, Communication & Society 22 (7): 934–953.

81.

Parvaneh

Giv.

2013. RoyGBiv. Accessed 29 July 2020. https://github.com/givp/RoyGBiv.

82.

Pham

Bhin

, Streit

Alex

, and Brown

Ross

. 2009. “ Visualisation of Information Uncertainty: Progress and Challenges. ” In Trends in Interactive Visualization , edited by Zudilova-Seinstra

Elena

, Adriaansen

Tony

, and van Liere

Robert Robert

, 19–48. London: Springer.

83.

Pim

2018. Exploring Wellcome Collection with Computer Vision. Wellcome Collection. Weblog. Accessed 30 July 2020. https://stacks.wellcomecollection.org/exploring-wellcome-collection-with- computer-vision-7513dff8126d.

84.

Poletti

Chiara

, and Gray

Daniel

. 2019. “ Good Data Is Critical Data: An Appeal for Critical Digital Studies. ” In Good Data , edited by Daly

Angela

, Devitt

S. Kate

, and Mann

Monique

, 260–276. Amsterdam: Institute of Network Cultures.

85.

Prabhu

Vinay Uday

, and Birhane

Abeba

. 2020. Large Image Datasets: A Pyrrhic Win for Computer Vision? Accessed 31 July 2020. https://arxiv.org/pdf/2006.16923.

86.

Priestley

Joseph.

1764. A Description of a Chart of Biography, Warrington: [no publisher]. British Library General Reference Collection 10604.aa.11.

87.

Priestley

Joseph.

1765. A Chart of Biography . London: [no publisher]. British Library General Reference Collection 611.l.19.

88.

Rocha Souza

Renato

, Dorn

Amelie

, Piringer

Barbara

, and Wandl-Vogt

Eveline

. 2019. “Towards A Taxonomy of Uncertainties: Analysing Sources of Spatio-Temporal Uncertainty on the Example of Non-Standard German Corpora. ” Informatics 6 (3): 34.

89.

Ruis

A. R.

, and Shaffer

David Williamson

. 2017. “Annals and Analytics: The Practice of History in the Age of Big Data. ” Medical History 61 (2): 336–342.

90.

Sacha

Dominik

, Senaratne

Hansi

, Kwon

Bum Chul

, Ellis

Geoffrey

, and Keim

Daniel A.

. 2016. “The Role of Uncertainty, Awareness, and Trust in Visual Analytics. ” IEEE Transactions on Visualization and Computer Graphics 22 (1): 240–249.

91.

Sandahl

Jette.

2019. “Addressing Societal Responsibilities Through Core Museum Functions and Methods: The Museum Definition, Prospects and Potentials. ” Museum International 71 (1-2): iv–iv.

92.

Schäfer

Una Ulrike.

2018. “Uncertainty Visualization and Digital 3D Modeling in Archaeology – a Brief Introduction. ” Digital Art History 3: 86–106.

93.

Setlhabi

Keletso Gaone.

2012. “The Curation of Transition: From Manual to Electronic Documentation Systems. ” Collections: A Journal for Museum and Archives Professionals 8 (1): 39–55.

94.

Sheppard

Kathleen L.

2010. “Flinders Petrie and Eugenics at UCL. ” Bulletin of the History of Archaeology 20 (1): 16–29.

95.

Simonyan

Karen

, and Zisserman

Andrew

. 2015. “ Very Deep Convolutional Networks for Large-Scale Image Recognition . ” Proc. 3rd International Conference on Learning Representations (ICLR2015). San Diego, CA, 7–9 May 2015.

96.

Skeels

Meredith

, Lee

Bongshin

, Smith

Greg

, and Robertson

George G.

. 2009. “Revealing Uncertainty for Information Visualization. ” Information Visualization 9 (1): 70–81.

97.

Smilkov

Daniel

, Carter

Shan

, Sculley

, Viégas

Fernanda B.

, and Wattenberg

Martin

. 2017. Direct-Manipulation Visualization of Deep Networks. ArXiv abs/1708.03788. n.p.

98.

Smith

David A.

, and Cordell

Ryan

. 2019. A Research Agenda for Historical and Multilingual Optical Character Recognition . Boston, MA: Northeastern University.

99.

Strothotte

Thomas

, Masuch

Maic

, and Isenberg

Tobias

. 1999. “Visualizing Knowledge about Virtual Reconstructions of Ancient Architecture.” Proceedings Computer Graphics International, Canmore, Alta., Canada, 1999. 36–43.

100.

Tateosian

Laura

, Glatz

Michelle

, and Shukunobe

Makiko

. 2020. “Story-telling Maps Generated from Semantic Representations of Events. ” Behaviour & Information Technology 39 (4): 391–413.

101.

Therón Sánchez

Roberto

, Santos

Alejandro Benito

, Vicente

Rodrigo Santamaría

, and Gómez

Antonio Losada

. 2019. “Towards an Uncertainty-Aware Visualization in the Digital Humanities. ” Informatics 6 (3): 31.

102.

Tolfo

Giorgia

, Vane

Olivia

, Beelen

Kaspar

, Hosseini

Kasra

, Lawrence

Jon

, Beavan

David

, and McDonough

Katherine

. 2021. “Hunting for Treasure: Living with Machines and the British Library Newspaper Collection.”

103.

Tóth-Czifra

Erzsébet.

2020. “ The Risk of Losing the Thick Description: Data Management Challenges Faced by the Arts and Humanities in the Evolving FAIR Data Ecosystem. ” In Digital Technology and the Practices of Humanities Research , edited by Edmond

Jennifer

. Cambridge: Open Book Publishers. 235–266. doi: 10.11647/OBP.0192

104.

Trček

Denis.

2009. “A Formal Apparatus for Modeling Trust in Computing Environments. ” Mathematical and Computer Modelling 49 (1–2): 226–233.

105.

Tufte

Edward Rolf.

1983. The Visual Display of Quantitative Information . 2nd ed. Cheshire, Conn: Graphics Press.

106.

Underwood

Ted.

2018. “Why an Age of Machine Learning Needs the Humanities.” Public Books. Accessed 2 July 2020. https://www.publicbooks.org/why-an-age-of-machine-learning-needs-the-humanities/.

107.

Valleriani

Matteo

, Kräutli

Florian

, Zamani

Maryam

, Tejedor

Alejandro

, Sander

Christoph

, Vogl

Malte

, Bertram

Sabine

, Funke

Gesa

, and Kantz

Holger

. 2019. “The Emergence of Epistemic Communities in the Sphaera Corpus: Mechanisms of Knowledge Evolution. ” Journal of Historical Network Research 3: 50–91.

108.

Van Ruymbeke

Muriel

, Hallot

Pierre

, and Billen

Roland

. 2017. “Enhancing CIDOC-CRM and Compatible Models with the Concept of Multiple Interpretation. ” Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2/W2: 287–294.

109.

Vane

Olivia.

2019. Timeline Design for Visualising Cultural Heritage Data. PhD Thesis, Royal College of Art. Accessed 29 July 2020. https://researchonline.rca.ac.uk/4325/.

110.

Verbert

Katrien

, Seipp

Karsten

, He

Chen

, Parra

Denis

, Wongchokprasitti

Chirayu

, and Brusilovsky

Peter

. 2016. “Scalable Exploration of Relevance Prospects to Support Decision Making.” Interfaces and Human Decision Making for Recommender Systems (IntRS 2016) 16 September 2016, Boston, MA.

111.

Wainer

Howard.

1997. Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte To Ross Perot . New York: Copernicus/Springer.

112.

Ward

Ken.

2018. “Social Networks, the 2016 US Presidential Election, and Kantian Ethics: Applying the Categorical Imperative to Cambridge Analytica's Behavioral Microtargeting. ” Journal of Media Ethics 33 (3): 133–148.

113.

White

Anne

, and Dunleavy

Patrick

. 2010. Making and Breaking Whitehall Departments: A Guide to Machinery of Government Changes . London: Institute for Government.

114.

Williams

Lyneise.

2019. “ What Computational Archival Science Can Learn from Art History and Material Culture Studies . ” Proc. 2019 IEEE International Conference on Big Data . 3153–3155.

115.

Williams

Betsy Anne

, Brooks

Catherine F.

, and Shmargad

Yotam

. 2018. “How Algorithms Discriminate Based on Data They Lack: Challenges, Solutions, and Policy Implications. ” Journal of Information Policy 8: 78–115.

116.

Windhager

Florian

, Federico

Paolo

, Schreder

Günther

, Glinka

Katrin

, Dörk

Marian

, Miksch

Silvia

, and Mayr

Eva

. 2018. “Visualization of Cultural Heritage Collection Data: State of the Art and Future Challenges. ” IEEE Transactions on Visualization and Computer Graphics 25 (6): 2311–2330.

117.

Windhager

Florian

, Salisu

Saminu

, and Mayr

Eva

. 2019. “Exhibiting Uncertainty: Visualizing Data Quality Indicators for Cultural Collections. ” Informatics 6 (3): 29.

118.

Windhager

Florian

, Salisu

Saminu

, Schreder

Günther

, and Mayr

Eva

. 2019. “Uncertainty of What and for Whom – and Does Anyone Care? Propositions for Cultural Collection Visualization.” 4th IEEE Workshop on Visualization for the Digital Humanities (VIS4DH 2019) Vancouver. [n.p.].

119.

Won

Miguel

, Murrieta-Flores

Patricia

, and Martins

Bruno

. 2018. “Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora. ” Frontiers in Digital Humanities 5 (2): 1–12.

120.

Wright

Kirsten.

2019. “Archival Interventions and the Language we Use. ” Archival Science 19 (4): 331–348.

121.

Yale University Library Digital Humanities Lab . 2017. Neural Neighbors. Accessed 30 July 2020. https://dhlab.yale.edu/projects/neural-neighbors/.

122.

Zamani

Maryam

, Tejedor

Alejandro

, Vogl

Malte

, Kräutli

Florian

, Valleriani

Matteo

, and Kantz

Holger

. 2020. “Evolution and Transformation of Early Modern Cosmological Knowledge: A Network Study. ” Nature Scientific Reports 10: 1–15.

123.

Žliobaitė

Indrė

. 2017. “Measuring Discrimination in Algorithmic Decision Making. ” Data Mining and Knowledge Discovery 31 (4): 1060–1089.