Digitizing the paper of record: Archiving digital newspapers at the New York Times

Abstract

This study uses three archiving efforts at the New York Times as a means to analyse the newspaper as an archival object. I study the traditional ‘morgue’ of physical clippings and photos, the Times’ joint project with Google Cloud to digitize its photo collection, and the TimesMachine interactive digital archive, which made scanned editions of printed issues from 1851 to 2002 publicly available online. Based on interviews with staff and analysis of documents describing past and present newspaper archiving practices, it is clear that the digital archive is not a comprehensive copy of an analogue original. There are a significant number of documents stored in physical archives that have not been translated to digital, and whose loss would be detrimental to historians and media scholars alike. Moreover, even the documents that have been scanned and made available as digital objects do not perfectly mirror their analogue equivalents, meaning that information loss is inherent to the digitization process. As active producers of the past for contemporary purposes, these online news archives serve as cultural gatekeepers, actively shaping journalistic practice and reframing current events in reference to the past.

Keywords

Archive digitization materiality mediation memory newspaper

Introduction

You know, I don’t know why this is, but reading old clips from the Times somehow is more informative than reading the digital version that you can find on the web, that somehow, you know, you’ve got the picture next to it, sometimes you’ve got an advertisement next to the story that places the time period in context. [. . .] You know, you open one of these folders with all of these clips in it and yellowed clippings from the 1950s or the 1960s, and suddenly you’re back there. In your own head, you’re back there, which doesn’t happen when you look up the stuff on the web.

~Bruce Weber, NPR, April 27, 2017

As this New York Times (NYT) obituary writer suggests, online archives do not simply reflect existing physical depositories. Rather, they constitute a distinct but related enterprise, both materially and conceptually A newspaper archive is a cultural resource, one that has long been available to journalists, scholars, and other researchers, but is only recently accessible to the wider public via the web. As content producers increasingly repackage and resell old content for revenue, it becomes more pressing to investigate the practices involved in digitization and their cultural ramifications. Furthermore, not all historical news documents make their way to digital archives. Some remain preserved only in physical depositories that few are even aware of, let alone have access to. It is therefore necessary to analyse how digital archives differ from their brick-and-mortar counterparts, and what information and insights may be lost in the translation from the latter to the former.

In what follows, I examine the digitization of the NYT archives as a gatekeeping process that affects both journalistic practice and cultural memory. Whether they are digital or physical, archives are an important expression of memory. They are crucial to any study of journalism history, and to historical research more broadly. Moreover, the design and use of these archives are both components of journalistic practice and instruments by which that practice is understood. This paper uses the NYT’s ongoing archival projects to make three related arguments about newspaper archives: (1) There is valuable information preserved in physical depositories that is not being digitized. Despite techno-utopian fantasies of total digitization, these material documents are still serving journalists and remain an important part of current journalistic practice. (2) Even the materials that are being converted into digital objects and made accessible through an online archive do not perfectly mirror the contents of the physical depository. (3) The ways in which the digital objects are presented in online news archives further mediate these objects and may shape cultural memory. Drawing from a gatekeeping framework, an apparatus approach and the field of memory studies, the paper examines the print-to-digital transformation and explores the decision-making process that determines what will be included in the future digital canon.

The (digital) newspaper archive

The digitization process at the NYT and elsewhere raises the basic question of what defines the archive, and of what kind of materialities, conceptual assumptions, and discursive forms it contains. The idea of the archive was an object of interest for both modernist and postmodernist writers. Derrida (1998) coined the term ‘archivization’ to describe how social, political and technological forces shape the structures of the archive that define its relationship to history and memory. He pointed to the methods of information transmission that shape the nature and the parameters of knowledge. For Foucault (1972), the archive is ‘the system of discursivity’ (p. 130) In his view, it is much more than the aggregate of the dusty documents sitting on its shelves; it is a corpus, a product of discourse (Bate, 2007).

The Foucauldean notion of the ‘archive’ as the set of rules that determine the range of what can be expressed has inspired scholars to develop a new approach to the study of media archaeology. The media-archaeological method proposes an alternative approach to media-historical narratives, ‘a kind of epistemological reverse engineering, and an awareness of moments when media themselves, not exclusively humans anymore, become active ‘archaeologists’ of knowledge’ (Ernst, 2011: 239). This perspective focuses on the infrastructure of media-historical knowledge, on the construction of archives – both digital and physical – and on how these archives influence and mediate our historical understanding of the past. According to Parikka (2015), this archive ‘is not only about the statements and rules found in books and libraries. Instead, it is to be found in technological networks of machines and institutions, patterns of education and drilling: in the scientific engineering complex that practices such forms of power’ (2015: 2).

To study the network of humans, machines and institutions that participate in and enable the construction of an archive of digital newspapers, this study employs an ‘apparatus approach’. As the name of the approach implies, Packer (2010) suggests examining the archive as an apparatus, or a mechanism for linking things together and ‘a strategically organized network of discursive and nondiscursive elements brought together to address problems resulting from specific formations of knowledge’ (p. 89). Understanding the construction of an archive – any archive – requires answering a set of questions to determine the elements and connections that allowed the archive to establish itself as a legitimized, authorized mechanism for providing evidence. Bringing the ‘apparatus approach’ to bear on the NYT’s archives illuminates the network of elements involved in digitization and the discourse that justifies the decisions of the NYT staffers.

Gatekeeping theory provides another prism through which to observe archive construction. Media scholars have long recognized the role of gatekeepers in selecting, filtering, mediating, interpreting, organizing and promoting the pieces of information that in turn shape public knowledge and the news’ representation of reality (Shoemaker et al., 2009). White (1950) examines the decision to include stories in a daily newspaper and notes ‘how highly subjective, how reliant upon value judgments based on the ‘gatekeeper’s’ own set of experiences, attitudes and expectations the communication of ‘news’ really is’ (White, 1950: 386). Breed (1955) continues this line of inquiry and shows how gatekeeping may occur at the upper echelons of a news organization, with publishers and editors playing a more central role than individual journalists.

If journalists rely on the news archive as a source of information, then it becomes important to scrutinize the considerations that impact decisions about what to preserve and the ways that the archive itself functions as a gatekeeper of information. Even as the two roles differ in important ways, the work of the news archivist is similar to the work of journalists in that both involve the selection of items for inclusion in the public record. Archival scholars have noted the power inherent in archival work (Schwartz and Cook, 2002), and acknowledge the role of archivists as gatekeepers of the past (Gauld, 2017). Therefore, the gatekeeping framework sheds light on the digital newspaper archive at two different levels. First, the news archivist serves as a gatekeeper who determines which clippings, notes, files, and other materials will be included in the physical and digital archives. The archive, in turn, functions as a gatekeeper that determines what information is available for inclusion in future journalistic products and other memory objects. Hence, this study defines the newspaper archive as a gatekeeper institution with the power to impact the news, and the archivist in charge of the archive as a gatekeeper of past journalistic products and future cultural memory.

To conceptualize the news archive as a journalistic object, I draw on studies that engage with the material aspects of journalism (Anderson and De Maeyer, 2015; Boczkowski, 2015; Usher, 2018). This ‘object-oriented approach’ examines the wider network of journalism-related products and actors and contends that by focusing on the objects of journalism, scholars can provide insights into its social, cultural and material context (Anderson and De Maeyer, 2015). Further, the material turn pushes researchers to look beyond the newsroom, which is often spotlighted in journalism studies (Boczkowski, 2015).

Usher (2018) foregrounds the objects of journalism and their role in producing audience trust and examines news buildings, the ‘raw materials’ of news and the products of news. Following her argument that ‘the physical infrastructure that limits or enhances news distribution deserves attention’ (Usher, 2018: 571), my study focuses on the news archive as both a hard thing worthy of scholarly attention and, at the same time, as a soft thing whose documents served as ‘raw materials’ of journalistic products. Hence, the paper that follows uses three ongoing archival projects at the NYT to answer important questions about the construction of digital newspaper archives: What factors are considered when deciding which raw journalistic objects to include in the archive? What are the strategies for digitizing journalistic objects and how are they made accessible through the digital archival interface? And finally, how are the answers to these questions involved in the construction of future memory?

News archives and memory work

To examine the ways that the digitization of newspapers and the construction of digital newspaper archives are involved in shaping future memory, this study draws on the insights of memory studies. The scholarship on journalism and memory has deepened significantly over the last three decades (Edy, 1999; Kitch, 2003; Meyers, 2007; Schudson, 1992; Zelizer, 1998). However, according to Zelizer (2008), these investigations did not find their way into the mainstream of collective memory studies. In their introduction to the edited volume Journalism and Memory, Zelizer and Tenenboim-Weinblatt (2014) elaborate on ‘the complicated role that journalism plays in keeping the past alive’ (p. 1). In doing so, they remind us of the necessary collaboration between memory studies and journalism scholarship. Indeed, whether they are revisiting past accomplishments in an obituary or framing current events in terms of historical analogies, journalists are constantly engaging with the past – using it, representing it, and forgetting it. From the perspective of memory studies, journalism serves as a powerful memory agent, reflecting on society in a specific time, place, and context. According to Olick (2014), journalism archives are ‘a constitutive feature of collective memory’. They reveal how a society perceived and discussed issues at different historical moments, and they enable comparisons of journalistic practice over time.

Thus, the practices involved in archival work have major ramifications for what will be remembered and what will be forgotten. Assmann (2008) distinguishes between active forgetting, which is an intentional act of memory destruction, and passive forgetting, which is unintentional and may be the result of neglect or poor preservation practices. In a similar vein, remembrance has active and passive manifestations. According to Assmann (2008) ‘the institutions of active memory preserve the past as present while the institutions of passive memory preserve the past as past’ (p. 98). She classifies active preservation as ‘canon’ and passive preservation as ‘archive’. Borrowing Assmanns’ distinction between canon and archive, this study investigates what kind of items are still used by journalists to explain contemporary events and which only act as records of the past.

This paper studies the digitization process at one of the most important news organizations in history. Since its founding in 1851, the NYT has shaped and continues to shape public opinion and political discourse. The NYT was the first newspaper to index its subjects, which is the origin of its reputation as the ‘newspaper of record’ (Martin and Hansen, 1998). Somewhat unusually, it maintains both an extensive digital archive and a physical clipping library – the so-called ‘morgue’. Through its TimesMachine project, it is also one of the few news organizations that offer scanned editions in an online archive, allowing its subscribers a fuller view of old editions. In what follows, I analyse these three ongoing archival efforts as a case study that sheds light on news archives more broadly as apparatus, as gatekeepers of news production and as agents of cultural memory.

Methodology

To understand the decision-making processes involved in digitization and the construction of a digital newspaper archive, I interviewed seven staff members who have worked on the NYT’s archiving initiatives. Each interview lasted between 1 and 3 hours and was conducted in-person, excluding one interview conducted via video chat. One interview included a visit to the NYT morgue where an archivist provided detailed information about its history, materials and practices, and where I was afforded the opportunity to observe daily work routines. During the time I was conducting the interviews (between March 2018 and January 2019), the NYT announced the partnership with Google Cloud. Interviewees in this study were not allowed to discuss the details of that partnership, but the news coverage of this development has provided some context.

All the interviews were recorded and transcribed, and the analysis is based on the main themes and issues that emerged in our conversations. My investigation was enriched by texts that NYT archival staff wrote or contributed to, including ‘The Future of the Past: Modernizing the New York Times Archive’’ published on the NYT ‘Open’ blog (Sandhaus et al., 2016), and by videos of presentations delivered by staff members at conferences and panels, such as the 2016 panel also titled ‘The Future of the Past: Modernizing the New York Times Archive’ (Valkenburg and Sandhaus, 2016). For the purposes of this study, I examined 32 texts (including video talks) from the NYT and other outlets that discussed relevant issues regarding the digitization of historical newspapers and archiving practices at the NYT. I then analysed and coded these materials based on major themes that surfaced during the interviews, such as physicality, digitization criteria, search, the digital archive interface, and the archival collections. In this paper, I will organize my findings around three ongoing archival initiatives at the NYT: the clipping archive known as the morgue, the digitization of the NYT photo collection by Google Cloud and the TimesMachine online archive for historical newspapers.

The morgue: Forgotten history

In the slang of the newsroom, the ‘morgue’ is a space (in many cases, located in the basement) where the news librarian or archivist stores the institution’s collection of clips. According to Saunders (2015), morgues have historically acted as an analogue database. Clippings are generally categorized thematically, breaking with the chronological order of the newspaper’s issues. Prior to the internet, the morgue was the only practical way to access old stories. Most morgues, however, did not survive the economic crisis faced by newspapers in the 21st century (Hansen and Paul, 2015; Saunders, 2015). Some newsrooms donated their contents to historical societies or libraries, but many of these surviving collections are likely incomplete, as clips have been stolen or mishandled (Ringel and Woodall, 2019). The NYT is one of the few news organizations to still maintain its morgue, albeit with significant changes. A 1921 article describes the important role of the morgue and the news librarian who maintains it:

The success of a newspaper morgue depends a great deal on the librarian in charge. He must be a man who is wide awake to his position and responsibilities. He must be a student of events and happenings so as to select the material to be filed. He must possess a mind for detail and organization and be able to reduce lost motion to a minimum. He must act as the connecting link between the past, the present and the future. He must be historian of contemporary events, and is supposed to have everything that has happened in the last fifty years at his fingertips. [. . .] When a big news story ‘breaks’, the order runs: ‘Get the morgue on the phone quick’, and the morgue man is expected to be ‘on the job’ like a fire horse. [. . .] The morgue is a big factor in modern journalism (Kwapil, 1922: 443–444).

The current NYT archivist possesses all these qualities. When he first started at the morgue, there were 20 other archivists and news librarians working alongside him. Today, he is the only one remaining. The NYT morgue is located three levels below ground, in the basement of a building beside the NYT headquarters on Eighth Avenue in Midtown Manhattan. In 2017, while the NYT relocated to its new office tower, the morgue was moved to the basement of the former New York Herald Tribune headquarters on West 41st Street. It is open only to NYT staff and journalists, and closed to the public. In the hours I spent there, I saw only a few visitors. The morgue is not easily accessible, and its materials are preserved in steel filing cabinets, cardboard boxes and the bookshelves that line the walls. The archivist is the only person with the knowledge to navigate the incredible amount of material, though he insists that the alphanumeric filing system is actually quite straightforward. Today his job responsibilities include maintaining the collection and locating clippings and photographs for researchers, reporters and obituary writers.

While all published versions of the NYT have long been available in microfilm and microfiche, the materials preserved in the morgue include clippings and photos that did not always make their way into the published paper. It also preserves early versions of stories that were killed, generally due to space considerations. The morgue is likely the only depository where one can find internal publications for customers and employees, including magazines with names like TimesTalk and SkyScraper. The archivist notes: ‘Unless we sent them to the Library of Congress, which we did not, there are no available copies of these magazines nowhere except for in here’.

Since the archivist began working solo a decade ago, the choice of which materials to save is entirely up to his discretion: ‘Just like one scholar sees something from one side and the other scholar sees something from a different way, what I keep someone else might not keep, but that is just because that’s me controlling it. Someone else controlling it can say, “Why do you need that for?”’ Even though he insists that there are no ‘preservation policies’ and that the filing process is pretty straightforward, he was influenced by the criteria passed on to him in his early days on the job by his colleagues, former archivists at the NYT. Here he describes the weeding process:

Usually, it would be two criteria. Some guy got engaged or some woman got engaged in 1935 and it’s 1970 and there’s never been another clip in that folder. So, you tossed it to – or super famous people, like the early clip files, might be thrown out. [. . .] The problem was that it was subjective to who was doing something that might have been interesting to you, might not be interesting to the other person, or someone might have weeded a newspaper or something that would be impossible to find online or in the New York Public Library.

Therefore, the materials that remain in the morgue are there because someone once decided they were worth preserving, a process not dissimilar from the decision of what gets published in the newspaper or, more recently, on the website. One theoretical lens through which to consider this selection process is the notion of newsworthiness (Galtung and Ruge, 1965). The study of what qualifies as news, how events become news, and what factors and values determine newsworthiness remains a major focus of communication research (Harcup and O’Neill, 2017). This is mirrored by questions relating to the conditions in which journalistic content becomes worthy of future preservation and digitization. According to Schudson (1989), the gatekeeper metaphor – which is still used by journalism scholars to describe the relations between news organizations and their products – is problematic in the sense that it ‘individualizes a bureaucratic phenomenon and implicitly transforms organizational bias into individual subjectivity’ (Schudson, 1989: 265). In the case of the morgue, however, the decisions to include documents in the archive are literally up to the single individual gatekeeper. Just as White (1950) observes that the decisions about which stories to publish are highly subjective and contingent on the judgment of the gatekeeper, the decisions regarding what will be preserved in the archive are determined by the values and attitude of one lone archivist.

As the NYT archivist explained, there are a significant number of documents stored in the morgue that have not been translated to digital. He estimates that only 1 percent of the morgue’s collections have been digitized, and the decision of which documents are worthy of digitization rests solely on his shoulders. Some of the documents preserved in the morgue can be found nowhere else, yet many researchers – including journalism scholars – may not even know they exist. Their significance, however, is not only scholarly or historical. The morgue’s contents are still used to aid in the writing of stories:

We were doing a big story, and here we have all this background information about the history of the housing authority, so for example, if you asked the housing authority for a history of their projects, they don’t have it. But here we have it, a chronology of major events. So, what I did was I scanned it and sent them to the reporter and the reporter used this doing the story. Would he have still done the story without it? Yeah. But again, it made it better. If you ask them, they wouldn’t want you to know it anyway. I mean it’s invaluable to a certain extent, but again, the story would get done whether I had that or not. Is it going to get done better? Yeah, probably.

For both scholars and journalists, the morgue still plays a significant backstage role in the construction of memory (Kitch, 2008; Zelizer, 2008). As the housing authority example illustrates, the morgue serves not only as a passive archive for preserving the past but also as an active memory agent in the present day. By actively circulating the memory and keeping the past alive in contemporary news stories, this information establishes its place in Assmann’s (2008) canon.

Such instances show that even when work does not explicitly require the morgue, its contents can still serve as important puzzle pieces that help present a fuller picture of the past. Further, the archives of news organizations can serve as ancillary archives for other institutions, supplementing and sometimes even replacing collections that have not been adequately maintained by these institutions themselves. This illustrates how the public service aspect of news outlets extends beyond the articles they choose to publish. Hence, the news archive functions as a gatekeeper of valuable information for the public as well as for journalists.

From the morgue to the cloud: The NYT’s partnership with Google

In September 2018, the NYT announced a partnership with Google Cloud to scan, tag, classify and preserve the paper’s entire photo archive. According to Brian Stevens, Google Cloud’s chief technology officer, ‘Google Cloud technologies [. . .] are helping to preserve this priceless history and give journalists a new way to search, access, and analyse millions of historic photos and give them new life’ (NYT, 2018) According to Nick Rockwell, chief technology officer at the NYT, each photo has a story of its own. Like a passport, it contains stamps, marks, and handwritten notes, and preserving the story of each photograph is as important as preserving the photo itself. ‘But, critically’, Rockwell (2018) says, ‘this project is not first and foremost about preservation; it’s about storytelling. By making the archive accessible, we make it possible to reach back into the past and tell stories that were left untold’.

This quote not only illustrates the role of archives in contemporary journalism but is also a reminder that the manner in which these materials are preserved may impact the stories that get told. Significant choices are made at each step of the process, each of which can shape interpretation. When you scan, do you include the handwritten marks on the back of the photograph? How do the archivists determine what aspects of a photo warrant tagging, and who devises the classification scheme? How will the scanned items be made available, who will be privy to them, and what interface will they access them through? The answers to all of these questions have the potential to shape the way these materials are understood and the narratives that emerge from their interpretation. They are the elements that comprise the archive as apparatus (Packer, 2010).

Furthermore, while scholars have examined the complex relationship between the publishers like the NYT and tech companies like Google in terms of advertising revenue, content distribution and search engine optimization (Bell et al., 2017; Helmond, 2015; Nechushtai, 2018; Nieborg and Poell, 2018), these studies have not addressed the ways in which newsrooms are themselves increasingly relying on the tools and services provided by digital platforms, such as back-up and storage. According to Kleis Nielsen and Ganter (2018), the relationship between news organizations and digital intermediaries like Google and Facebook is characterized by a tension between short-term opportunities – particularly the chance to reach large audiences – and long-term strategies, like the desire of papers to maintain control over how their content is distributed. Kleis Nielsen and Ganter (2018) demonstrate how this asymmetrical relationship is driven by the news organizations’ fear of missing out and by a difficulty in evaluating the risk and long-term implications of cooperating with digital intermediaries.

A similar dynamic has appeared in the relationship between the NYT and Google, with the former becoming ever more dependent on the latter. Sam Greenfield, Google’s technical director, posted the following on Google Cloud’s blog:

Helping The New York Times transform its photo archive fits perfectly with Google’s mission to organize the world’s information and make it universally accessible and useful. We hope that by sharing what we did, we can inspire more organizations – not just publishers – to look to the cloud, and tools like Cloud Vision API, Cloud Storage, Cloud Pub/Sub, and Cloud SQL, to preserve and share their rich history (Greenfield, 2018).

In this framing, while the morgue had preserved important historical documents, they were neglected until Google arrived to make them accessible for future journalists and the public. This narrative raises important questions: First, does cloud storage actually secure the long-term preservation of scanned objects? Studies have shown that preservation requires much more than scanning and uploading digital objects to cloud storage. Such measures come closer to backing up the morgue than ensuring access to its contents over the next 10, 20 or 30 years (Corrado and Sandy, 2017), an interval over which there is no guarantee Google will survive.

The second issue concerns access. Google inserts itself as a mediator between journalists and the morgue’s contents. Where previously reporters would request a particular item from the morgue’s archivists, now the scanned photos will be searchable and accessible to photo editors and reporters at the NYT newsroom. Selected photos will then, in turn, appear before the public when they are embedded in a particular story. One such example is the NYT project ‘Past Tense’: ‘As we digitize some six million photo prints in our files – dating back more than 100 years – we are using those images to bring vivid narratives and compelling characters of the past to life’ (Past Tense [NYT], 2020). This archival project is another illustration of the role that archives play in the complex relationship between memory and journalism, and in bringing the past into the present to form a canon (Assmann, 2008).

The TimesMachine: Copy of a copy

The TimesMachine was launched in 2014 as a set of ‘interactive digital archives’ that include scanned copies of printed Times issues published between 1851 and 2002. The NYT concurrently launched the NYT Archives Twitter account that links to posts within the TimesMachine. This iteration was a rehabilitation of an earlier version of the TimesMachine, the first draft of which relied on clunky PDF documents and was plagued by technical issues. As described by one of the interviewees, the new team sought to create a web-based archive for a better user experience.

A leading team member explained that the TimesMachine includes every issue that meets the following criteria: 1) The issue was previously microfilmed by the NYT or another source, 2) that microfilm was in turn digitized by the NYT or others, and 3) this digitized microfilm survived the transformation into the TimesMachine format. There are several issues missing from that collection, including the Sunday issues from the NYT’s first few years of existence and issues impacted by the newspaper strikes during the 1960s and 1970s. A few others are missing due to technical challenges in the migration from microfilm to digital.

Since the TimesMachine is the product of a conversion process from microfilm, and microfilm is itself a migration from print, the interviewees acknowledge that some articles got lost in translation. Further, archivists had to make decisions about what content would be deemed canonical. For example, the version of the paper recorded on microfilm was historically the Late City Edition, the last edition that went to print on a given day. In other words, content that was published only in earlier editions was not preserved on microfilm and therefore did not migrate to the TimesMachine.

According to one of the principal workers at the TimesMachine, the project’s mission is to ‘give people an experience of the archive that really conveyed the historicity of the newspaper itself’. This mission informs the way they construct their archive. As that team member put it:

Isolating an article from the surrounding content removes the context in which it was published. A modern reader might discover that on July 20, 1969, a man named John Fairfax became the first man to row across the Atlantic Ocean. However, a reader absorbed in The New York Times that morning might have been considerably more impressed by the front-page news that Apollo 11, whose crew contained Neil Armstrong, had just swung into orbit around the moon in preparation for the first moon landing. Knowing where that John Fairfax article was published in the paper (bottom left of the front page) as well as what else was going on that day is much more interesting and valuable to a historian than an article on its own without the context of other news of the day. We wanted to present the archive in all its glory as it was meant to be consumed on the day it was printed – one issue at a time. Our goal was to create a fluid viewing experience, not to force users to slowly download high-resolution images (Cotler and Sandhaus, 2016).

In other words, the TimesMachine is also a context machine (Lafrance, 2014). First, it allows users to not only access published content but to know where exactly it was published in the paper. This is a valuable context for journalists, scholars, commercial investigators, students, researchers working for public institutions and other memory agents. Viewing stories in isolation would be similar to the situation in the morgue, which consists of decontextualized stories rather than entire editions. But the TimesMachine was built in order to allow users to see the bigger picture, including the ads, pictures and layout of each late-night edition of the NYT.

Further, the ways in which the TimesMachine search engine operates may reflect the change of journalistic practices over time. One of the TimesMachine team members explained in an interview that the autocompletion options offered in its search boxes are based on the NYT Index. Between 1913 and 2016, an indexing team categorized every story published in the NYT based on relevant people, places, organizations, subjects and titles. Owner and publisher Adolph S. Ochs promoted this index to libraries, colleges and other newspapers, contending it was a way to preserve ‘files future historians would refer [to] when it became necessary to settle some obscure point’ (Tifft and Jones, 2001).

This index forms the basis of autocompleted search results – a search for ‘Kansas’ will retrieve articles that had been manually indexed under ‘Kansas’, not just those that mentioned the word. This system is not transparent to archive users, because the team members intended to make the search tool intuitive. Further, when users choose to sort their search by ‘relevance’ in the TimesMachine, they use a technology similar to the ‘elastic search’ function on the broader NYT website. In a way that resembles the traditional NYT Index, this function places constraints on search results that steer users to certain articles. Such a system necessarily prioritizes some topics of inquiry over others, thus guiding users toward certain historical conclusions over others.

Another way the digital archive guides its users is found on the homepage of the TimesMachine, which daily displays a NYT front page from that date in a year between 1851 and 2002. For example, on May 15, 2018, the TimesMachine home page shared the frontpage from May 15, 1948, the day Israel declared itself an independent country. The choice was particularly striking because the 2018 date was the day that the US embassy in Israel relocated from Tel Aviv to Jerusalem. A front-page story in the NYT May 15, 2018 (Figure 1) edition reported on the consequences of the relocation.

Figure 1.

NYT front page and the TimesMachine homepage May 15, 2018.

This selection raises the question of whether the choice represents an editorial decision aimed at providing historical context or an algorithm selecting random years. In an interview with one of the TimesMachine staff members, they explained that they built a spreadsheet with a list of ‘important events that we here at the NYT covered’, and it changes automatically every day. While the homepage of the TimesMachine changes automatically, the Twitter account updates manually and, as explained by the interviewee, consciously responds to the news:

When there’s a debate about same-sex marriage before it was passed, we would pull up this is the old debate for interracial marriage, or an article about gay rights. If Trump brings up an issue of something that happened in the past, we can pull up the past coverage of it, if it fits within our historical era, which for TimesMachine is 1851 to 2002 at the moment. Sometimes we will tweet things a little newer but only if they’re super significant. We try to stay within the whole idea of promoting the tool.

The TimesMachine, its Twitter account and the Instagram account for the NYT photo collection all reframe past events in light of current ones. They link readers to the past while also contributing to the NYT’s framing of current news. In this way, the archive serves as a living, breathing part of the NYT’s contemporary journalism.

Concluding discussion

This study brings the insights of archival studies into journalism research by examining three archival efforts at the NYT. Firstly, I examined the news archive as a journalistic object. As the ‘paper of record’, the NYT preserves its long history and repackages it for digital and social media audiences. Its physical and virtual archives continue to serve as important resources for scholars and journalists, both for their historical value and for how they are used to reframe current events.

As the above analysis demonstrates, the morgue archivist functions as a gatekeeper in ways that mirror the journalist. As there is just one archivist in the morgue, the responsibility of deciding what will be preserved – thus, remembered – and what will be thrown away – thus, forgotten – are all under the purview of a single caretaker. Further, the materials preserved in the morgue continue to serve as sources of information for journalists. Borrowing Assmann’s (2008) distinction between the archive and the canon, I argued that these documents constitute part of a canon that sets the limits of future memory since their depictions of the past are employed to interpret the present, and will continue to do so into the future.

The three ongoing digitization projects at the NYT demonstrate how the digital newspaper archive is not isolated or separated from journalistic practice. Rather, the news archive is involved in the production of journalism. It is an apparatus that connects journalists, archivists, computer scientists, researchers, technologies and platforms. It comprises plans, practices, archival approaches and journalistic theories (Packer, 2010). Moreover, as this paper demonstrates, the organization of the news archival documents, the structure and maintenance of the morgue and the human and machine factors involved in the archival process have not been well-documented. This oversight needs to be rectified, as archives are important mediators of information to journalists, scholars and the public. They have the potential to impact our understanding of the past, the present and the future, and therefore need to be understood.

It is not surprising that the staff involved in these archiving efforts – and therefore most of the people I interviewed for this study – are not formally-trained, professional archivists. Digital archiving efforts are managed mostly by computer scientists and the tagging, categorizing and metadata components are the responsibilities of the NYT taxonomy team. Thus, the long-term preservation of news materials – the printed newspapers, clipping files and digital publications – are no longer managed according to traditional archival conventions and standards. These findings align with Hansen and Paul‘s (2015, 2017) studies which document the shrinking of the news librarian workforce and point to this development’s troubling implications for news preservation in the digital age. Additionally, while this study focused on the digitization and preservation of printed copies, previous studies have shown the dire state of preserving digital-born news (Broussard, 2015; Carner et al., 2014; Ringel and Woodall, 2019).

The partnership with Google Cloud to digitize the morgue’s photo collection illustrates the platformization of cultural productions. Other scholars have demonstrated the growing dependence of news organizations on advertising revenue, data, tools and governance standards from platforms (Nechushtai, 2018; Nieborg and Poell, 2018). The digitization of the NYT morgue’s photo collection by Google is another example of this trend. This dependence raises questions concerning access – who, for example, will have access and rights to these photos? Further, while other forms of technological and economic dependencies – such as data collection and distribution – have immediate implications, archival dependency has implications on future access. How can we ensure that the NYT historical photo collection digitized by Google will still be available 20 or even 10 years from now, whether or not Google continues to exist? The media outlets announcing such partnerships portray Google as a saviour of the historical record without interrogating whether they are equipped or entitled to manage such an important cache of American history. This positive and often heroic discourse, introducing Google as the ultimate technological access solution, echoes other digitization efforts like Google Books and Google Arts & Culture. Such a techno-utopian discourse is rooted in academic, journalistic and trade publications that promote new technologies to archival professional communities (Woodall and Ringel, 2019).

In my analysis of the TimesMachine search interface, I explored the ways digital archiving practices must balance the need to optimize user experience with the imperative to maintain context. Each of these practices affects the availability and organization of information. As important as they are, these factors and decision-making processes behind these archival projects stay invisible to the archive users; TimesMachine users are not aware that the autocomplete feature of the archive interface is based on the historical NYT Index. The TimesMachine search interface – in a way similar to the traditional NYT Index – influences contextual knowledge by determining the limitations and constraints on information access. An interface can prioritize data or records, highlighting some topics of inquiry over others, thus guiding users toward certain determined historical conclusions (Ringel, 2020). But the ways in which these options are determined and what human and cultural forces are at play remain obscure to the vast majority of users, even though such choices deeply impact the way archival materials are presented and hence perceived, interpreted and remembered. The people I interviewed for this research, and others who are involved in similar archiving efforts, are invisible actors. Despite the ability of such professionals to influence journalism products and shape future memory, they are rarely examined by journalism scholars. My work here is a first step towards understanding the role of these archivists in journalistic practice.

Footnotes

Acknowledgements

I would like to thank Emily Bell and Angela Woodall from the Tow Center for Digital Journalism at Columbia University’s Graduate School of Journalism; Roei Davidson and Rivka Ribak from the Department of Communication, University of Haifa; and the two anonymous reviewers for their help.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was made possible with financial support from the Tow Center for Digital Journalism at Columbia University’s Graduate School of Journalism.

ORCID iD

Sharon Ringel

Author biography

Sharon Ringel, PhD, is a lecturer in the Department of Communication, University of Haifa, Israel. Her studies focus on the social aspects of digitization, archives, digital preservation, human-machine communication, and cultural memory.

References

Anderson

De Maeyer

(2015) Objects of journalism and the news. Journalism 16(1): 3–9.

Assmann

(2008) Canon and Archive. In: Erll

, et al. (eds) Cultural Memory Studies: An International Interdisciplinary Handbook. Berlin, NY: Walter de Gruyter, pp.97–108.

Bate

(2007) The archaeology of photography: Rereading Michel Foucault and the archaeology of knowledge. Afterimage 35(3): 3.

Bell

Owen

Brown

, et al. (2017) The Platform Press: How Silicon Valley Reengineered Journalism. Available at: https://doi.org/10.7916/D8R216ZZ (accessed 27 February 2020).

Boczkowski

(2015) The material turn in the study of journalism: Some hopeful and cautionary remarks from an early explorer. Journalism 16(1): 65–68.

Breed

(1955) Social control in the newsroom: A functional analysis. Social Forces 33(4): 326–335.

Broussard

(2015) Preserving news apps present huge challenges. Newspaper Research Journal 36(3): 299–313.

Carner

McCain

Zarndt

(2014) Missing links: The digital news preservation discontinuity. IFLA 2014 LYON. Available at: https://www.ifla.org/files/assets/newspapers/Geneva_2014/s6-carner-en.pdf (accessed 6 February 2021).

Corrado

Sandy

(2017) Digital Preservation for Libraries, Archives, and Museums. Lanham, MD: Rowman & Littlefield.

10.

Cotler

Sandhaus

(2016) How to build a timesmachine. In: Open Blog. Available at: https://open.blogs.nytimes.com/2016/02/01/how-to-build-a-timesmachine/ (accessed 20 September 2020).

11.

Derrida

(1998) Archive Fever: A Freudian Impression (trans. Prenowitz

), 1st edn. Chicago, IL: University of Chicago Press.

12.

Edy

(1999) Journalistic uses of collective memory. Journal of communication 49(2): 71–85.

13.

Ernst

(2011) Media archaeography: Method and Machine versus history and narrative of media. In: Huhtamo

Parikka

(eds) Media Archaeology: Approaches, Applications, and Implications. Berkrlry, Los Angeles, London: University of Calofornia Press, pp.239–255.

14.

Foucault

(1972) The Archaeology of Knowledge. World of Man. London: Tavistock Publications.

15.

Galtung

Ruge

(1965) The structure of foreign news: The presentation of the Congo, Cuba and Cyprus Crises in four Norwegian newspapers. Journal of Peace Research 2(1): 64–90.

16.

Gauld

(2017) Democratising or privileging: The democratisation of knowledge and the role of the archivist. Archival Science 17(3): 227–245.

17.

Greenfield

(2018) Picture what the cloud can do: How the New York Times is using Google Cloud to find untold stories in millions of archived photos. Available at: https://cloud.google.com/blog/products/ai-machine-learning/how-the-new-york-times-is-using-google-cloud-to-find-untold-stories-in-millions-of-archived-photos/ (accessed 20 September 2020).

18.

Hansen

Paul

(2015) Newspaper archives reveal major gaps in digital age. Newspaper Research Journal 36(3): 290–298.

19.

Hansen

Paul

(2017) Future-Proofing the News: Preserving the First Draft of History. Lanham, MD: Rowman & Littlefield.

20.

Harcup

O’Neill

(2017) What is news? News values revisited (again). Journalism Studies 18(12): 1470–1488.

21.

Helmond

(2015) The platformization of the web: Making web data platform ready. Social Media + Society 1(2): 2056305115603080.

22.

Kitch

(2003) Generational identity and memory in American newsmagazines. Journalism 4(2): 185–202.

23.

Kitch

(2008) Placing journalism inside memory – and memory studies. Memory Studies 1(3): 311–320.

24.

Kleis Nielsen

Ganter

(2018) Dealing with digital intermediaries: A case study of the relations between publishers and platforms. New Media & Society 20(4): 1600–1617.

25.

Kwapil

(1922) The “Morgue” as a factor in journalism. The Library Journal 46: 443–446.

26.

Lafrance

(2014) What One little button reveals about The New York Times ‘Brain’. The Atlantic, 6 August. Available at: https://www.theatlantic.com/technology/archive/2014/08/what-one-little-button-reveals-about-the-brain-of-the-new-york-times/375631/ (accessed 2 June 2021).

27.

Martin

Hansen

(1998) Newspapers of Record in a Digital Age: From Hot Type to Hot Link. Westport, CT: Greenwood Publishing Group.

28.

Meyers

(2007) Memory in journalism and the memory of journalism: Israeli journalists and the constructed legacy of Haolam Hazeh. Journal of Communication 57(4): 719–738.

29.

Nechushtai

(2018) Could digital platforms capture the media through infrastructure? Journalism 19(8): 1043–1058.

30.

Nieborg

Poell

(2018) The platformization of cultural production: Theorizing the contingent cultural commodity. New Media & Society 20(11): 4275–4292.

31.

NYT (2018) The New York Times digitizes millions of historical photos using google cloud technology. Available at: https://www.nytco.com/press/new-york-times-google-cloud/ (accessed 20 September 2020).

32.

Olick

(2014) Reflections on the underdeveloped relations between journalism and memory studies. In: Zelizer

Tenenboim-Weinblatt

(eds) Journalism and Memory. New York, NY: Palgrave Macmillan, pp.17–31.

33.

Packer

(2010) What is an archive? An apparatus model for communications and media history. The Communication Review 13(1): 88–104.

34.

Parikka

(2015) A Geology of Media. Minneapolis: University of Minnesota Press.

35.

Past Tense (NYT) (2020) Past tense. Available at: https://www.nytimes.com/spotlight/past-tense (accessed 20 September 2020).

36.

Ringel

(2020) Interfacing with the past: Archival digitization and the construction of digital depository. Convergence. Epub ahead of print 9 December 2020. DOI: 10.1177/1354856520972997.

37.

Ringel

Woodall

(2019) A Public Record at Risk: The Dire State of News Archiving in the Digital Age. New York, NY: Tow Center for Digital Journlism Columbia Journalism Review. Available at: https://www.cjr.org/tow_center_reports/the-dire-state-of-news-archiving-in-the-digital-age.php

38.

Rockwell (2018) New York Times CTO Announces Partnership with Google Cloud. Available at: https://www.youtube.com/watch?v=ICcSYYC4KGQ (accessed 20 September 2020).

39.

Sandhaus

Cotler

van Valkenburg

(2016) The future of the past: Modernizing The New York Times archive. In: Dodging the memory hole 2016: Saving online news, Charles E. Young Research Library, UCLA, 14 October 2016. Available at: https://www.rjionline.org/stories/panel-the-future-of-the-past-modernizing-the-new-york-times-archive (accessed 2 June 2021).

40.

Saunders

(2015) Too late now: Libraries’ intertwined challenges of newspaper morgues, microfilm, and digitization. RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage 16(2): 127–140.

41.

Schudson

(1989) The sociology of news production. Media, Culture & Society 11(3): 263–282.

42.

Schudson

(1992) Watergate in American Memory: How We Remember, Forget, and Reconstruct the Past. New York, NY: BasicBooks.

43.

Schwartz

Cook

(2002) Archives, records, and power: The making of modern memory. Archival science 2(1): 1–19.

44.

Shoemaker

Vos

Reese

(2009) Journalists as gatekeepers. In: Wahl-Jorgensen

Hanitzsch

(eds) The Handbook of Journalism Studies. New York, NY: Routledge, pp.73–87.

45.

Tifft

Jones

(2001) Opinion. Dusting off the search engine. The New York Times, 17 November. Available at: https://www.nytimes.com/2001/11/17/opinion/dusting-off-the-search-engine.html (accessed 20 September 2020).

46.

Usher

(2018) Re-thinking trust in the news. Journalism Studies 19(4): 564–578.

47.

Valkenburg

Sandhaus

(2016) The future of the past: Modernizing the New York Times archive. In: Open Blog. Available at: https://open.blogs.nytimes.com/2016/07/26/the-future-of-the-past-modernizing-the-new-york-times-archive/ (accessed 20 September 2020).

48.

White

(1950) The “gate keeper”: A case study in the selection of news. Journalism Quarterly 27(4): 383–390.

49.

Woodall

Ringel

(2019) Blockchain archival discourse: Trust and the imaginaries of digital preservation. New Media & Society 22(12): 2200–2217.

50.

Zelizer

(1998) Remembering to Forget: Holocaust Memory through the Camera’s Eye. Chicago: University of Chicago Press. Available at: Publisher description http://www.loc.gov/catdir/description/uchi052/98018164.html; Table of contents http://www.loc.gov/catdir/toc/uchi052/98018164.html

51.

Zelizer

(2008) Why memory’s work on journalism does not reflect journalism’s work on memory. Memory studies 1(1): 79–87.

52.

Zelizer

Tenenboim-Weinblatt

(2014) Journalism’s Memory Work. In: Zelizer

Tenenboim-Weinblatt

(eds) Journalism and Memory. Palgrave Macmillan Memory Studies. London: Palgrave Macmillan, pp.1–14.