Abstract
Existing research offers fearful conclusions on the use of online archival collections, finding that historians ignore and overlook the limitations of digital sources. However, an attitudinal case study at Newcastle University contradicts this consensus. This article discusses this study’s detailed findings, determining that historians and library professionals demonstrate abundant and nuanced awareness of issues relating to ‘digital selectivity’. Nevertheless, the interviewees suggested that this does not radically undermine their practice. The study also revealed compelling aspects of digital selectivity not currently observed in existing research, including the effects of career stage and the importance of cooperation with library professionals. While the existing literature provides appropriate and prudent concerns, it lacks tangible evidence pointing to a widespread phenomenon of poor digital historical practice. The article closes by recommending further research into historians’ digital information-seeking behaviour, but argues that standardized metadata practices must come first. Limitations, including the absence of rigorous quantitative data, must be resolved before a fuller appreciation for digital historical practice is developed.
Keywords
Introduction
The proliferation of digital archival collections provokes appropriate concerns for historical research practice. The assorted limitations of digital resources, grouped under the heading ‘digital selectivity’, imply that collation, curation and research practice impair the quality of resulting scholarship. 1 Numerous historical scholars exhibit such concerns. Hitchcock (2013: 9) identified ‘a series of substantial problems for historians’ who use digital materials, including ‘algorithm-driven discovery and misleading forms of search, poor OCR [optical character recognition], and all the selection biases’ of archival collation. Bingham (2010: 229) noted the inevitable methodological problems of digitized newspaper archives and warned of possible distortions in research from the ‘availability of certain titles and the absence of others’. Underpinning these concerns is the suspicion that researchers ignore or misjudge these limitations. Toon (2019: 93), for example, writes that digital selectivity is a ‘fact all historians should acknowledge more frequently’, but ‘users may not realize’ the full extent of the problem.
Attention to the use of digital resources is prudent, but there is a lack of evidence demonstrating widespread complacency among historians. Usage statistics might show how often scholars access particular digital archives, but without standardized metadata or consistent referencing formats, we cannot determine how effectively historians use their sources, or the quality of resulting research (Sinn and Soares, 2014). If poor research is occurring, there are any number of potential causes, including curatorial and collation processes, inadequate digitization, user experience design or historians’ naivety. The proclaimed limitations of digital archival collections may be no different than the limitations of physical archival sources (Sinn, 2012). Without a thorough attitudinal examination, it is difficult to determine that historical researchers do not realize the pitfalls of digital selectivity (Toon, 2019: 93).
These uncertainties motivated a limited attitudinal case study involving historians and library professionals at Newcastle University, resulting in a Master of Science dissertation (Coburn, 2019). Contrary to the existing consensus, the study revealed widespread awareness of digital selectivity, as academic faculty and library professionals expressed deep and broad appreciation for the selectivity of online digital archives. Every participant described how they altered their interaction with source material to overcome such drawbacks. The inclusion of library professionals, whose knowledge and understanding have so far been curiously underused in discussions of digital historical practice, also uncovered a significant site of collaboration that would alleviate concerns and improve historians’ practice. All of these findings complicate influential studies of digital historical practice.
This article discusses the detailed findings of this study and the implications for evaluating historians’ digital research practice. It begins by addressing common themes in the existing literature, noting that appropriate concerns lack data pointing to inattention and poor digital historical practice as a widespread phenomenon. Limitations, including the absence of rigorous quantitative data, must be resolved before we develop a fuller appreciation for digital historical practice. The article then explains the findings of the study at Newcastle University. Academic faculty expressed nuanced appreciation for the impact of digital archives but suggested that even these changes did not radically undermine their practice. The study also revealed compelling aspects of digital selectivity not currently observed in existing research, including the effects of career stage and the importance of cooperation with library professionals. The article closes with suggested recommendations to allay fears over digital historical practice and instil best practice throughout the discipline. Further dedicated research is necessary, but standardized metadata practices and a better understanding of historians’ information-seeking behaviour must come first.
‘Confronting the digital’ in the existing literature
Digitization has required historians to reassess the extent of their information literacy. Traditionally, historical archives contained ‘either the papers of some particular person or the papers or records of a particular organization’ (Owens, 2014). Now, digital archival collections collate a wider variety of materials packaged around historical themes. Materials may be ‘digital surrogates’ or ‘born digital’ (Theimer, 2014). They can be assembled ‘from holdings of many repositories’, and Castañeda (2013) demonstrates that digital platforms present a rich tapestry of archival material drawn from multiple research methods (see also Theimer, 2014).
The act of collation and curation is no longer the preserve of special collections librarians and archivists, but has become a ‘vast, decentralized and idiosyncratic’ exercise led by third-party content companies (Keeling and Sandlos, 2011: 423). Historians commonly suspect a lack of oversight involved in the collation, packaging and distribution of digital archival collections (Jones, 2017). This is especially important when materials have been taken from several repositories and packaged around a different theme. Moravec (2017) argues that digitization occurs with little accountability, raising problems of historical authority as a preference for ‘pop’ themes dictates priorities. Prescott (2014: 337) similarly argues that the vast digitization of The Times ignored ‘other eighteenth-century newspapers’. He challenges Google Books’ assumption that ‘what are considered in Silicon Valley to be the world’s greatest libraries contain all the world’s knowledge’ (p. 338). Ooghe and Moreels (2009) write that the practices underlying selection ‘appear most often based on ad hoc decisions or on available funds’ rather than on a standard set of guidelines. Ogilvie (2016) affirms that those ill-equipped to determine the value of sources often make decisions to digitize and share materials. It is the apparent obfuscation of these decisions that causes unease. As Fickers (2012: 22) asserts, historical researchers need to ‘see the records as the creating agency saw them’. The arbitrary rearrangement of archival materials otherwise breaks ‘the intertextual relationships of documents’ (p. 22) and disrupts the provenance essential to historical research.
Interfaces, platforms and digitization methods also profoundly affect a researcher’s task. Platforms no longer provide simple search–retrieve functions, but embed the presentation of their content with software tools that offer transformative analytical methods (Jordanova, 2015; Keeling and Sandlos, 2011). Yet Jarlbrink and Snickars’ (2017: 1233) study of digitized Swedish newspapers found that digital encoding and OCR generation transformed original source material beyond recognition, with ‘millions of misinterpreted words generated by OCR, and millions of texts re-edited by the auto-segmentation tool’. It made the resulting material impossible to study. Potter and Holley (2010: 13), meanwhile, caution against relying on digital reproductions, acknowledging that digitization cannot capture the intricate details of an original source: ‘Some material can really only be seen in person’. Martin (2007) explains that digitizing rare materials often loses details from cropping or distortion, or involves simply excluding material that cannot be adequately digitized. Moreover, Keeling and Sandlos (2011) warn that quality and practices vary widely.
Beyond this, there are concerns that digital archival collections simply cannot recreate the context provided by a physical archive. As Rimmer et al. (2008: 1390) found, ‘original documents will always have an authenticity and “magic” about them which cannot be replicated by digital surrogates’. This is especially important for historians. As Theimer (2014) argues, ‘understanding why and how an information resource was created – that is to say, its context – is more valid than ever in digital historiography’. Limited interoperability and an inconsistent use of metadata also prevent historians from finding all of the archival material available to them, and differing presentation and analytical tools between platforms give entirely different user experiences, even when packages host the same content (Sinn, 2012).
With these pitfalls in mind, it is prudent to ask whether historians are ‘doing good history’ (Gregory, 2014). Hitchcock’s (2013) influential article, ‘Confronting the digital’, charged that historians were not cognizant of ‘misleading forms of search, poor OCR, and all the selection biases’ (p. 9) embedded in digitized collections. ‘Academic historians’, he wrote, ‘have largely failed to respond effectively to these challenges’ (p. 9). A wealth of scholarship finds that historians lack the awareness to navigate digital archives appropriately (see, for example, Harter, 1998; Hobbs, 2013; Knights, 2015; Presnell, 2007). Toon (2019) implies that historians assume perfection in OCR, which breeds complacency. Huistra and Mellink (2016) explain that historians must rethink their queries and diversify their search terms to account for variable keyword coding. The suspicion that historians settle with their initial results and do not consider whether the material available to them is comprehensive, representative of the source material or sorted for value remains a concern (Knights, 2015; Potter and Holley, 2010). Historians are also charged with overusing easily available collections and avoiding others that are less accessible, which warps and distorts whole schools of subsequent research (Hobbs, 2013; Moravec, 2016a).
The consensus holds that extraction of material from a physical environment to a digital platform negatively affects historians’ research practice, but this claim is ambiguous. First, the absence of agreed and consistent standards for archive metadata and referencing formats makes it near impossible to determine how collections are being used. Koolen et al. (2018: 369) write that ‘currently, there is no established method of assessing the role of digital tools in the research trajectory of humanities scholars’. Sternfeld (2011) similarly argues for the need for an entirely new theory and methodology to frame digital history practices adequately.
Sinn and Soares (2014: 1794) note that statistical and quantitative data focusing on the ‘impact’ of digital collections ‘stops short of revealing the prior paths historians took to learn about each source or explaining why they decided to use a particular source for their specific projects’. Without standardized metadata, referencing formats or usage statistics, it is difficult to determine when particular archives have been used, which means that it is difficult to determine how they have been used. Moravec (2016b) writes persuasively that the consequence of digitization ‘may be good, it may be bad’; the only certainty is that ‘it is definitely different’.
Second, few studies have investigated whether historians are aware of the problems identified by surrounding scholarship. This is an amorphous and abstract site for study, but there are enlightening precedents. Deploying interviews and surveys to collect data on attitudes and behaviours, Sinn and Soares (2014: 1807) ‘shed light on the digital collection use, such as the reasons why historians considered using and not using them, while also revealing some benefits and barriers historians experienced when using digital materials’. Gracy (2013: 346–373) examined ‘professional attitudes toward digital distribution’ in order to provide a more complete picture of archival practices. Rudyk (2016) also demonstrates the benefits of looking at ruptures between ‘attitudes and behaviours’ in digital research practice. Awareness is important as it informs potential solutions. For example, if historians are aware of limitations but misuse digital archival collections anyway, it suggests a fault with interfaces; if they are unaware of limitations, there are problems with communication.
Understanding attitudes is, nevertheless, complicated by the commonly narrow pool of respondents to surveys. Many studies take a broad overview and depend on the willingness of participants to actively engage and respond to data collection. Typically, these participants hold positive views of their experience, which is why they are so willing to interact (Green et al., 2015). A recent Jisc-sponsored study recognized this limitation in its findings as it drew from users ‘selected for their experience with and creativity in using the resources’ (Meyer and Eccles, 2016: 33). The study acknowledged that ‘it is perhaps no surprise that the users interviewed here were overwhelmingly positive’ (p. 33).
Following this, a third complicating factor is the relatively light insight into the information-seeking behaviour of historians using digital archival collections. Information and archival scholars have provided illuminating examinations of historical research practice. However, many of these important findings are now nearing 20 years old (see, for example, Duff and Johnson, 2002; Tibbo, 2003). The digital archival landscape is profoundly different. There are suggestions that historians romanticize ‘traditional’ research methods over the new, which fosters an irrational fear of digital archival use (Rimmer et al., 2008). As Jamali and Asadi (2010: 283) observed, understandings of information-seeking behaviour are ‘partly based on anecdotal observations by librarians and academics rather than on robust research evidence’. A growing body of new studies grapples with the functional practices of historical research. Freund and Toms’ (2016) work in archival finding aids and Korkeamäki and Kumpulainen’s (2019) revealing study of information interaction ‘in digital environments’ are examples. Freund and Toms’ (2016) snapshot examines a limited aspect of historians’ archival work, which is not scalable to primary-source interaction. On the other hand, Korkeamäki and Kumpulainen (2019) seek to improve information retrieval interfaces by covering a range of historians’ tasks and source use, which is too broad for insight into historians’ digital archival practice. Further research focused on the specifics of digital archival collection use is much needed. Absent thorough investigation, it is difficult to determine how disruptive the rise of digitization has been for primary-source research.
Fourth, conversations concerning the use of digital archival collections have so far noticeably overlooked academic libraries, though digitization arguably disrupts their processes in greater ways. Ball (2016: 167) explains that the ‘unstable and unpredictable environment’ led to libraries ‘re-evaluating internal processes and structures, enhancing and developing the skills within their teams, and embracing new possibilities for strengthening and enhancing partnerships with publishers and the academic community’. Turner (2014: 45) reviewed libraries’ acquisition processes, ultimately finding that the ‘scholarly publishing paradigm is shifting’ with changing licensing restrictions for e-resources. Libraries are attempting to meet new demands, but Turner explains that ‘the complexity and volatility of the scholarly publishing marketplace, the strength of individual institutional interests, and financial constraints have created a potent brew’ (p. 46). If nothing else, as Jisc found in 2019, academic libraries are being squeezed by two contradictory demands: on the one hand, increasing demand for high-quality digital primary-source material; on the other, consistent limitations on library budgets that ‘make purchase of these often-expensive content resources difficult’.
The absence of librarians from digital historical debates is particularly curious given that they possess the knowledge and skills to resolve digital selectivity. Keeling and Sandlos (2011: 423) worried that the democratization of the digitization process would spark ‘practitioner document digitization’ and subvert library professionals’ authority. However, this does not appear to have occurred. Academic libraries retain their influence, and specialist indexing, cataloguing and classification remain a valued feature of information organization and access, especially for online digital resources. Vasileiou et al. (2012) suggest that the traditional role that academic libraries played – selecting books, organizing collections and making them available to learners, scholars and researchers – would alleviate concerns and instil trust if applied to the selection and acquisition of digital archival material. Green and Lampron (2017: 772) conclude that library professionals’ extensive knowledge and experience makes greater integrative partnerships between scholars and librarians ‘more important than ever’.
The Newcastle University study
All of this is to say that, while caution is sensible, appropriate and necessary, fears for digital historical practice are based on limited findings. The need for further investigation motivated a short case study to investigate attitudes and awareness among history faculty and library staff at Newcastle University (Coburn, 2019). The study determined how researchers and library professionals perceive their interactions with digital archival collections – predominantly those packaged by digital content companies. It adds to wider arguments concerning historians’ information-seeking behaviour in digital environments.
Three objectives guided the study. The first was to ascertain the attitudes, beliefs and practices of selected Newcastle University academic and library staff members in relation to the creation, acquisition and use of digital archival collections. The second was to separate the various factors perceived to encompass digital selectivity by the study’s participants and determine how awareness of these factors influences research and teaching practices. Finally, the third objective was to offer recommendations that could limit the impact of digital selectivity on future historical practice and alleviate the fears over digital archival research expressed by scholars, academics and library professionals.
Research design
The research took place between March and June 2019. It collated data from semi-structured interviews with eight participants: three library professionals from different areas working in the Philip Robinson Library and five members of the academic faculty in the history department (out of a total of approximately 50 regular academic staff). The library staff were sought based on their role, seniority and experience in managing digital collections for historians. Staff in academic liaison, special collections and acquisitions were approached directly with introductory emails requesting initial informal interviews. The historians were similarly selected to represent a cross section of experience (late-career, mid-career and early career researchers) with mixed digital archives, research specialisms and faculty roles to ensure that the selection represented the varied demographic make-up of the department (see Table 1).
List of Interviewees.
After the initial approaches and conversations with around 20 potential respondents, informal interviews were scheduled with eight participants based on their responsiveness and insight. The informal interviews narrowed specific areas of interest, which led to formal recorded interviews two to three weeks later. 2 The interviews took place in staff offices and lasted between 30 minutes and 1 hour. Full transcripts of the interviews were written in the following month and the investigator conferred with the participants to ensure that the transcripts represented a true account of their interview and beliefs.
The data was collected and analysed using grounded theory and thematic analytical methods (Creswell and Creswell, 2018; Guest et al., 2014; Morse and Maddox, 2014). A number of studies informed this approach. Guest et al. (2014: 3–20), for example, described the four basic steps (familiarization, identification, review and construction). Braun and Clarke’s (2006) celebrated work on thematic analysis in psychology, although applied to a different academic discipline, also provided an integral framework for the data collection and analysis. The research was ‘underpinned by the interpretivist paradigm where the focus is on understanding the social realities through the interpretations and perceptions of these realities by its participants’ (Rudyk, 2016: 23. See also Blaxter et al., 2010; Bryman, 2016). Much of the analysis is reflective and interpretative (Pickard, 2013), and informed by Miles and Huberman’s description of the qualitative data analysis process as ‘data reduction; data display; conclusion drawing and verification’ (Gorman et al., 2005: 205).
Grounded theory informed the study inasmuch as the project induced its hypotheses from close data analysis (Silverman, 2014). A pre-existing hypothesis in the literature review (the negative impact on historical practice had by digital content companies’ archival practices) was laced with ambiguity. This study wanted to produce a new theory from data, rather than testing this hypothesis (Chapman et al., 2015). Data analysis and data collection occur concurrently in grounded theory, which is itself a natural condition of interviewing participants. For example, in this case, the insights gleaned from earlier interviews necessarily shaped later interviews and follow-up conversations with the participants to pursue emerging themes that had not previously arisen. The data was ‘continuously categorised and compared across interviews’ (Chapman et al., 2015: 202). The researcher collected and analysed this data at the same time and then ‘integrated the information in the interpretation of the overall results’ (p. 202).
Initial analysis occurred during the interviews, with contemporary notations structuring the participants’ responses. Interviews offer multiple perspectives and holistic descriptions, and bridge intersubjectivities between multiple participants and the investigator. Essentially, dialogue and conversation with relevant participants allowed the investigation to hone in on particular areas of interest. These formed initial codes that, after transcription was completed and confirmed with the participants, fed into the data analysis through NVivo. This software was selected as it is available to the researcher through the research institution and holds a reputation for effective assistance in this type of research project (Bazeley and Jackson, 2013; Hilal and Alabri, 2013).
Limitations
The study recognizes limitations. First, interviews include high subjectivity, difficulties in replicating outcomes for future research, generalizations in topic, a relative lack of transparency, and a large volume of generated data (Bryman, 2016; Rudyk, 2016). Owing to the differing roles and responsibilities of each member of staff, the questions differed across the interviews, while still addressing similar themes. This stymies direct comparisons of responses between the interviewees, but the semi-structured nature of the conversations was nevertheless productive.
Second, the study only sought to explain processes that occur within the limited confines of one university department. This presented a few benefits. Ironically, by narrowing its focus to one department in one institution, this study provided a broader cross section of academic and library opinion, as response was not dictated by a willingness to engage, but by proactive effort from the researcher (Green and Lampron, 2017). The study can offer an outline for determining how people perceive their relationship with digital information and inform information scientists and archivists in their attempts to tailor material to users in the future. Nevertheless, the focus on one history department limits the findings to this specific context. The university, department and library’s status and processes are ‘typical’ of the broader academic environment, which makes it ‘representative of a broader set of cases’ (Gerring with Seawright, 2007: 91), but generalizing findings must be done with care.
Similarly, the study interviewed a limited number of academic faculty. It achieved a representative cross section of research interests, career stages and professional roles within the Newcastle University history department, but the study’s findings are not scalable to the broader context of the discipline. Historical scholarship and research practice is, by its nature, dependent on the individual, so even securing an apparently representative selection of academic faculty may not be representative of those who share similar research interests, career stages or department roles, for example. Further study would expand the number of participants to allow for broader insights and more complex analysis into attitudes and awareness across the profession.
Findings
Abundant awareness of digital selectivity
The significant finding is that the academic faculty are abundantly aware of digital selectivity and account for it in their research practice. This awareness was unanimous and nuanced. For example, all of the participants spoke in considerable depth about the improved availability of relevant materials, describing this as the ability to forestall travel to archive sites for research purposes and cutting ‘down a lot of time in having to travel to places’ (Interviewee 2). Yet Interviewee 5 showed that access does not just mean the immediate retrieval of sources, but also the ability to download and retain materials in perpetuity, thereby alleviating the limitations that can be placed on researchers, such as time constraints or technical disruption.
Those who had accessed physical archives explained that being able to utilize the same materials through a digital platform supplemented their initial analysis. Access also allows faculty with competing responsibilities, such as childcare or more restrictive employment contracts, to engage with academic work. In this sense, the researchers believed that the access afforded by digital archival collections fosters equality and participation in the academic profession. Interviewee 6 explained that their access to digital archives expanded their research, as sources became available that would previously be off limits: ‘I would probably never have thought about using [the types of sources made available to them]. I think that research would be impossible, actually’.
The academic faculty were also universally cognizant of the opportunities presented by digital interfaces and analytical tools. Interviewee 5 put the ‘richness’ of digital archives down to the ‘different ways that people can access’ them. Even offline, retrieving sources from a digital collection has allowed historians at Newcastle University to deploy their own methods of organizing and analysis, be that ‘rudimentary tagging systems’ or transcribing written records (Interviewee 5). Four out of the five participants explained, in depth, how their approach to source analysis had altered as a result of materials being presented in a digital format. One, a researcher exclaimed: ‘I don’t know how people did their research before. Like, I literally don’t know’ (Interviewee 8).
Researchers are, nevertheless, far from naive. The majority of the content across the interviews with the historians involved discussion of various drawbacks, broken down into digital archival creation, profit motive, transparency, the neglect of niche fields, digitization quality and licensing restrictions. In addition to this, when discussing researchers’ use of digital archives, the interviewees expressed concern for the development of research skills, searchability, ease of use, access limitations and the importance of serendipitous findings. Attention to selective digitization is supplemented by distrust in the profit motives of digital archival companies, with widespread concern for curatorial decision-making. Interviewee 5, otherwise expressing significant positivity, explained that their main frustration with digital archival collections was that ‘they don’t seem to have the information that I was hoping that they had’. Interviewee 6 stressed that ‘I do have some reservations, particularly for newspapers. I’m always a bit unsure as how complete those collections are in terms of what’s actually digitized’.
The spectre of profit motives affects academics in other ways. Interviewee 7 believed that the relationship between researcher and archivist is one of mutual cooperation. If the researcher needed access to further material, they could ‘get in contact’ with an archivist and ‘see if there’s a way you can access them’. However, in digital archival collections produced by private companies, ‘I doubt they would if it’s a paywall, because they’d want you to access theirs’. Even when help is provided, profit motives create distrust, which led Interviewee 7 to believe that ‘they’re very helpful, but I suppose they’re there to sell you a resource’. Whereas physical archives present research guides and accessions information to document their selection methods, digital archival companies lack such transparency, which led Interviewee 6 to claim: ‘I think they have to be a lot more upfront about what is in their collection and what isn’t in their collection’.
All of the academic faculty interviewed at Newcastle University are aware of potential limitations and have developed practical steps to overcome such problems. Every participant described their belief that keyword searching saves ‘an incredible amount of time’ and that their work could ‘be impossible to do’ if ‘it wasn’t digitized’ (Interviewee 6). These descriptions are not made uncritically, however. Every participant, while praising the influence of keyword searching, offered the caveat that OCR software is imperfect and must be used cautiously. Interviewee 7 said that ‘searchability can be tricky’, while Interviewee 8 explained how they performed a manual check of particular words on selected PDF images to ensure that their keyword searches functioned correctly.
Continuation of practice
These considerations, while necessary, are not transformative. The academic respondents consistently stated that their approach to digital archives reflects the same caution they apply to physical materials. Interviewee 7 described their previous encounters with censorship of archives performed by state governments: ‘I suppose it’s the same thing. It’s just a different way really’. Interviewee 2 declared: There’s the argument, which isn’t digital or non-digital, of just getting a partial view from what you get from written documents, full stop. You know, which we’re all aware of. You know, you’re only getting one perspective, right. In any archive, there are archives and then there are other archives.
Interviewee 8 addressed the same feature of curatorial selectivity, explaining: Even with physical archives, they’ve been catalogued in a certain way and it’s not necessarily the way you would like them to be catalogued . . . every archive has been curated and is the product of many, many years of curation. So, I don’t see online archives differently.
Digitization simply adds a ‘different layer of subjectivity’ to that found with physical archival collections (Interviewee 2).
The academic researchers also revealed that we are not moving towards an era of digital dependency. Interviewee 5 explained that they found the same ‘thrill’ of history in their perusal of digital materials, but others emphasized that they preferred visiting physical collections, observing additional benefits that digital archives cannot capture (Interviewee 5). Interviewee 7 explained that, if the costs were the same, they would rather take time to travel to a physical archival collection than simply click to access a digital archive from home. When asked whether they would rather have the convenience of a digital collection over travelling to a physical collection, Interviewee 2 similarly declared: ‘Definitely not. Absolutely not’.
In practice, some researchers use digital collections to get a sense of what an archive holds before travelling to immerse themselves fully in physical archives. Interviewee 6, having viewed digital reproductions, made a trip to see the original materials for more depth of understanding. They believed that ‘there’s probably never a substitute for actually seeing the originals in person’.
These responses complicate evaluations in the surrounding literature. The academic faculty do not rely on digital collections to perform their research and employ ample awareness to adjust their practice when necessary. In fact, many of the perceived problems are issues that historical researchers have always dealt with when using physical archival collections. These findings suggest that, while digitization impacts historical practice, the consequences do not pose an especially transformative problem for the profession. Historians overcome these limitations.
Variations by career stage
All of the respondents exhibited awareness, but the Newcastle study found variations in concern depending on career stage and field. This became especially clear on the topic of serendipity. Because of their collation, curation and interfaces, digital archival collections prevent the general browsing that historians perform in physical archives. This, in turn, reduces the opportunities for chance finds. This was the highest-ranking concern by number of codes from the academic respondents, but it was only discussed by two participants, both of whom were in a later career stage than the other participants in the study. Interviewee 2, for example, anchored their discussion of serendipity by recounting a conversation between themselves and a colleague from earlier in their career. Averring that historical research requires the arduous and painstaking review of many materials before a discovery occurs serendipitously, they exclaimed: ‘What do you mean, a selection technique? You just got to go through the whole lot’.
This rebuff to the more surgical form of data retrieval that occurs with digital archives is grounded in the participant’s long experience of performing research prior to the widespread adoption of digital archival collections. They continued by explaining: Quite often when you’re going through loads of stuff that you think ‘This is boring me rigid’, you know, ‘When am I going to come across something relevant?’, something relevant comes up and it’s not one of those keywords you used if you were in a digital archive situation. (Interviewee 2)
Once more, the participant made a distinct separation between the conduct in a physical archive and that in a digital archive. Early career researchers (ECRs), on the other hand, were more inclined to talk about the benefits of keyword searches and surgical discovery. Concerns over digital selectivity are, therefore, not consistent across the profession. Different academics prioritize different aspects.
The Newcastle study presented another intonation of digital selectivity that often escapes observation. ECRs must now endure several years of casualized and precarious work post-PhD, frequently moving between temporary contracts at different higher education institutions. Every ECR in this study had experienced this. Every ECR in this study also emphasized how their career status impeded their research, as they consistently lost access to a digital archival package provided by their university library when their contract expired (Interviewees 6, 7 and 8). This has urgent ramifications for the practice of historical research among those making their way in the profession. Interviewee 6 said that ‘it was just luck’ that they were able to access an important archival collection after moving to a new university. Interviewee 8 relied on colleagues at other institutions to access necessary archives on their behalf, having lost access from one job to the next.
Temporary affiliations make ECRs reluctant to request that their host library provide access to materials that they need. Interviewee 7 explained that their non-permanent position made them question ‘Would the library buy it anyway, just because I’m interested?’. Interviewee 6 concurred, explaining that their ‘level of career’ made it particularly difficult to overcome limitations in access. They believed that ‘it’s a little bit easier once you get into a permanent post, at lecturer level’.
Historians are not afflicted by digital selectivity in the same way, and some must overcome more hurdles than others. ECRs are both more afflicted by this issue and more aware of it. Senior and permanent members of staff did not raise this problem in their interviews. There is a disparity in the use of digital archives and a disparity in the awareness of digital selectivity depending on a historian’s career status. Urgent investigation and resolution are required to determine how digital selectivity impacts faculty members at different stages of their career. This aspect of digital selectivity, and the erratic disparity of access to archival collections based on rapidly changing employment circumstances, has yet to be fully appreciated for its impact on research output. It has significant ramifications for understanding digital research practice.
Library professionals
The Newcastle study introduced the expertise and opinions of library professionals into the debate over digital historical practice. They expressed authority, knowledge and extensive experience managing the rigorous demands of digital archival provision. The participants widely agreed that digital archives have impacted the practice and expectations of library services in different ways. The academic liaison staff explained that their management of online digital archives requires them to take more responsibility for technical services. There is also a greater expectation for liaison staff to make more resources available and act in collaboration with representatives of private content companies. All of these notable changes have occurred only in the last 10 years (Interviewee 1). The special collections staff member was the keenest to explain how digital archives have affected their role. Their proximity to archival collections, and their expertise and experience with physical archives, allowed them to make interesting comparisons between online and offline resources, and speak knowledgeably about these differences. They had more concerns with digital archives as a result. Many of the changes in their role related to user expectations, as both academic and public researchers wished to utilize online resources (Interviewee 3). The impact on library practice was not, however, described as revolutionary, but as an extension and amendment of previous responsibilities.
The staff emphasized concerns over the creation and acquisition process, but were more inclined to recognize the benefits provided by digital archival collections. This was largely a result of their close coordination and awareness of faculty practices and research outputs. The content provided by digital archive companies is not as transformative as the ‘overlayering’ of materials with unique analytical tools and organizational features (Interviewee 3). Interviewee 3 explained that ‘I’m still fairly agnostic about the content’ but that the provision of ‘digital scholarship tools like text-mining and visualizations, I thought that was very interesting’. Moving away from ‘a search and retrieve thing’ towards ‘trying to layer tools over’ source materials would provide ‘new interpretations’ (Interviewee 3). Interviewee 1 similarly noted: ‘You can search the archives or browse them in more sophisticated ways than if you’ll just kind of got all the materials laid out on a desk in front of you’.
Unlike academic faculty, library professionals exert a degree of control over the digitization process. Interviewee 1 spoke in detail about the cost incurred by purchasing access to digital archival collections, which was ‘certainly significant’ given that ‘budgets are under pressure’ at present. But the participant also emphasized their responsibility for the resources under their purview, to the extent that they could guide the form that the library’s collections would take over the next few years. Interviewee 3 explained that licensing restrictions resulting from collaborations with digital archival companies could hinder how special collections archivists go about their work, but explained that the decision to enter into such relationships was discretionary and something that they could determine themselves. This presents opportunities for collaboration that can overcome suspicions, distrust and concerns over the authority of particular digital archival collections.
The library professionals were forthcoming with suggestions and recommendations for improved practice in the future, and evident in all of their responses was a desire to ameliorate the problems experienced by historical researchers. They largely believed that further training in digital archival use is important. Indeed, Interviewee 3 believed that ‘the only solution comes from teaching’. The further content within this theme, however, reinforces just how complex the issue of digital selectivity is. The ‘concepts behind digital scholarship and digital humanities’ are so rigorous that it would be difficult to comprehend fully without proper teaching (Interviewee 3). Identifying the problem within historical research is a start, but fixing the issues that cause concern to library staff and academic faculty requires long-term and thoughtful solutions.
The staff also encouraged further unification and collaboration across the various stages of digitization – between academic faculty, librarians and digital archival companies – while also ensuring that interfaces, platforms and metadata are designed to foster interoperability across multiple ‘silos’ (Interviewee 3). This, more than any other recommendation, would do most to alleviate the selectivity of access to information. However, the most common theme of recommendation from both Interviewee 1 and Interviewee 3 was standardized referencing practices to facilitate better usage tracking and analysis. Even among the library staff, usage statistics were incomplete and vague. Interviewee 1 acknowledged that they ‘don’t know the granularity behind’ their statistics and that some of the library’s understanding of demand and impact was based less on quantitative data than on ‘the type of queries’ they got from users. Regardless of awareness over how digital archival collections are imperfect, ‘it’s near nigh on impossible for us to then track’ whether sources are being used effectively (Interviewee 1). Resolving this, in partnership with librarians, should be a priority of historical researchers.
Recommendations
The Newcastle study complicates existing understandings of digital selectivity and historians’ responses to it. Suspicions that academic researchers have failed to respond to the challenges of digital archival collections appear exaggerated, given the breadth and depth of awareness among the historians. The study also offers suggestions to allay fears, while ensuring that researchers share knowledge and best practice more frequently.
More dialogue
First, historians need to talk to each other. Existing studies generally agree that historians do not approach digital archives as rigorously as they should. In fact, two out of the five academic interviewees also ‘worry that historians misuse’ digital archival collections (Interviewee 6). But this in itself should prove that awareness of digital selectivity is widespread throughout the profession. All concerns for historical practice come from historians who have, independently, recognized limitations and suitably adjusted their practice. But they suspect that other historians are not as deft in studying digital sources as they are themselves. This suggests that the real problem is isolation and lack of conversation across the discipline, as scholars evidently do not share their concerns or best practice with others often enough. Interviewee 6 perfectly summarized the findings of the study, explaining: ‘I like to think that people are aware of [digital selectivity] but we just don’t talk about it enough’. While the academics each demonstrated a clear and deep awareness of digital selectivity and its impact on their profession, they often acted in isolation, rather than collaboratively.
We need greater collaboration across departments, disciplines and institutions. Digital selectivity is something that every interview participant had experienced personally, and they each demonstrated initiative and resilience by developing bespoke strategies to overcome its consequences. It is nevertheless evident that appropriate information literacy and research skills must be foregrounded throughout the historical profession to share best practice and, potentially, develop consistent solutions. If the perceived problems of digital selectivity are not as severe as existing research makes out, prominent knowledge-sharing initiatives will at least stymie the fears and apprehensions expressed by many within the field. A model for information literacy instruction that foregrounds historical research skills would also promote awareness of information literacy and digital research skills, as well as much-needed discussion within and between academic circles (Coburn, forthcoming).
For this to succeed, historians should embrace the expertise and skills of library professionals. The rising importance of library skills – information literacy and digital research practice among them – provokes increasing demands for collaboration across campuses. Schonfeld (2010) argued that historians should recognize the value of library staff within their institution. In its ‘Framework for information literacy in higher education’, the Association of College and Research Libraries (2015) urged for more cooperation across libraries and faculty departments. The Newcastle study shows that library staff’s concerns for digital selectivity reflect those of historians, and there is vast potential for cooperation to alleviate and mitigate the concerns felt by separate departments. The control and authority that library professionals exert in the acquisition of their archival packages can dramatically reduce fears.
Digital content companies’ public relations problem
All of the interviewees expressed positive relationships with digital content companies and their representatives, but there remained doubts over curation and distribution practices. Curators of physical collections provide an additional layer, preventing historical researchers from accessing archival collections in their pure form. But assiduous accession records chart the curation processes and finding aids, and rigorous archival assistance eliminates any obfuscation of material. While digital archives’ use of historical specialists demonstrates the quality of their products, their business models do favour collections that would receive most use, rather than niche areas of interest to individual historians engaged in a bespoke investigation. Digital selectivity intensifies this by introducing additional, subtle subjectivities. All of the participants perceived an opaque collation and curation process, which is exacerbated in instances that profit motive and business priorities influence curation, pricing and distribution. Greater transparency, collaboration and scholarly communication across stakeholders would alleviate some of the distrust directed towards content providers.
Regular primary-source analysis necessitates awareness of the context in which a source was produced, so all that is needed to alleviate digital selectivity is increased transparency and accurate metadata from content companies to allow historians to weigh up this context from the digital perspective. A positive step in this direction is Gale’s (2019) Digital Scholar Lab, which emphasizes the necessary interrelationship between content providers and academic researchers. This unified approach is desirable.
Expanded understanding of digital selectivity
The Newcastle study also elicits an expanded understanding of digital selectivity. We must do more to appreciate disparities in access based on career stage, as ECRs demonstrably face additional hurdles not experienced by senior colleagues. The impact of this disparity is not fully understood. Additionally, much of the secondary literature notes the selective curation of British historical newspapers, but this study found historians raising the same concerns for all forms of digital archival collection. By focusing on the limitations of particular forms of digital resource, especially newspapers, the existing research is too narrow.
Future research must compare and contrast how academic researchers encounter a wider pool of digital resources from across geographies and time periods in order to understand fully how the digitization and packaging of primary material affects historical practice. Digital selectivity should also move beyond historical practice to consider ethics, especially when the subject involves historically oppressed and marginalized people. Interviewee 8 described this clearly throughout their interview. The harvesting of primary materials from marginalized groups, which are then kept behind costly paywalls, means that ‘people whose history you’re studying, like, for example, enslaved people, actually no longer have access to these sources’. Calling this a ‘double exploitation’, Interviewee 8 affirmed that the problem of profit motives in archival collation is heightened by geographic disparities, as most scholars and academics are based in the global North and if you study the global South and you try to create these online resources about the global South, then you have to make them free – and available to everyone. (Interviewee 8)
Such comments show that we must consider not just historical practicalities, but also the ethics, morality and social justice of digitization.
Further data-led research
We also need more quantitative data-led findings to uncover how historians use digital collections. Further investigation of historians’ information-seeking behaviour could transform understanding of digital archival practice. The movement of archives from a physical location to a digital platform may affect historians’ interaction with source material. But there is a potential contradiction between historians’ view of information retrieval and the desired design of online platforms (Korkeamäki and Kumpulainen, 2019: 42). Studies on information-seeking behaviour find that researchers across multiple disciplines value accessibility and choose ‘the path of least resistance’ in their search and retrieval strategies (Anderson et al., 2001; Wellings, 2016). Usability is a priority for digital archival companies when designing platforms and interfaces, as they want users to be able to quickly, easily and directly retrieve the specific data that they require.
However, the desired research behaviour of the historians, as expressed in the interviews, suggests that ease of use could be a bad thing for rigorous historical research. The academic faculty thought that information retrieval should be more difficult in order to foster skills among their students, and Interviewee 5 explained that: ‘It’s important for them to start learning how to find the stuff themselves’. The difficulties involved in finding relevant materials for historical research not only develop skills among professionals, but necessitate the kind of analytical rigour that produces ‘good’ history. For example, as Interviewees 2, 7 and 8 all discussed, it is only by visiting the location of particular archives that historians can get ‘that sort of contextual experience to be able to really understand’ the lived experiences of the historical subjects under study, and to ‘retrace the steps of the people, you know, that you are studying’ (Interviewee 8). In this sense, that digital archives make it easier for people to access relevant material could actually be a drawback for historians. Digital platforms could potentially improve historians’ use of their platforms by providing an interface that makes a difficult information–retrieval process a feature of the user experience. But these are tentative findings that require further research.
The Newcastle study also encountered an immediate difficulty in its research design. Its initial model followed the Toolkit for the Impact of Digitised Scholarly Resources (TIDSR), a suite of tools developed in 2009 under the supervision of a number of specialists and with funding from Jisc (2018). The TIDSR suggests a mix of qualitative and quantitative measures, such as bibliometrics, focus groups, content analysis, resource surveys and impact assessments. Deploying bibliometrics and citation analysis to evaluate how history faculty use digital archival material in their work would have augmented the rather limited data on the number of ‘hits’ a digital resource receives, with insight into the subsequent output and productions of historical research arising from its use (O’Dwyer and Bernaeur, 2014). But the Newcastle study quickly found that librarians and digital content companies lack the ability to track the outputs that have drawn on their archives. This is, in large part, due to the absence of an agreed standard of academic referencing for digital archives and databases, which puts digital archive collections at odds with e-journals, whose content and standards of reference allow hosts to track the use of their resources, the quality of their material and how effective researchers are in their use of such resources (Sinn, 2012; Sternfeld, 2011). A lack of adequate citation analysis presents a severe methodological limitation to studies of digital historical practice.
Conclusion
This study found that there is not a significant crisis in historians’ use of digital archival collections. Contrary to existing works, academic faculty are fully cognizant of the extra steps needed to ensure appropriate use of digital archival collections. All of the participants in the Newcastle University case study recognized the various benefits and drawbacks that arise from using digital archival collections.
Many of these factors are not new to historians and library staff. Instead, they represent the transference of methodological and user hindrances from the world of physical archives to the digital realm. Ease of access has always hindered researchers wishing to draw on as wide a pool of resources as possible; archival curation practices have frequently imposed arbitrary censorship of materials. Digitization does mean that scholars must consider how user interfaces and platforms change the comprehensiveness of source analysis – in some cases, researchers have access to more information and more methods of analysis; in others, they cannot manipulate physical, tangible aspects of a source in the way they would like to. It is factors such as these that make digital selectivity appear to be something new. But the associated problems can be alleviated with heightened awareness of the trappings of digitization.
Nevertheless, academic historians and library professionals must reckon with heightened and exacerbated disparity according to their career status. ECRs on temporary contracts should receive greater acknowledgment and support for the barriers they face in accessing digital archival collections. Further research into this aspect of digital selectivity is urgent. We should also develop a more robust understanding of historians’ information-seeking behaviour.
Before that research takes place, stakeholders should endeavour to implement common standards, especially in metadata use and citation formats. It is evident that inadequate user statistics and a lack of tangible evidence of how effective historians are in their use of digital archives actively prevents a thorough understanding of how the field currently engages with digital source material. This suggests that many of the findings in the secondary literature are based on fears and hypotheticals – assumptions that other academic researchers do not conduct ‘good’ history – rather than secure findings. Idiosyncratic metadata practices and the absence of agreed referencing formats prevents library professionals from determining the value of their collections to the academic community they serve. It prevents accurate measurement of how effectively historical researchers use such resources. There is an urgent need for agreed standards – not only for the sake of rigorous practice, but also to allow future research to accrue an accurate impression of digital resource use. Until this occurs, most discussion of digital archival collections, their packaging and their use by academic researchers will be partial and limited.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
