Abstract
The COVID-19 pandemic had immediate and potentially long-lasting impacts on cities. Yet, the ability to assess, monitor, and analyze the wide-ranging effects of the pandemic has been stymied by data challenges. The pandemic elevated the need for, and reliance on, a wide range of data sources. We discuss four data challenges related to understanding the impact of the pandemic on cities. First, we explore how shifts in public policy and the decisions of private companies altered data collection priorities, availability, and reliability. Second, we discuss temporal dimensions, including the speed of data retrieval and frequency of data collection. Third, we identify the growing use of unexpected sources, which often feature a lack of rigor and consistency. Fourth, we explore the spatial scale of study and highlight questions about the interpretation of boundaries constituting the city. We use examples from the City of Toronto to ground our observations while also pointing to broader issues. We note that the tension between rapid, novel data and slow, consistent data continues to evolve and argue that a deeper appreciation and analysis of, and access to, myriad sources of data are necessary to understand the immediate and long-term impacts of COVID-19 on cities. Beyond the pandemic, our essay contributes to ongoing and emerging debates regarding the use of big data to understand the challenges facing cities and society.
Introduction
The COVID-19 pandemic has elevated demand for a wide range of data to monitor and study the immediate and long-term impacts on cities. Indeed, data has featured prominently in analyses of pandemic-induced urban change, and there remain open questions about the future of cities and urban life in the wake of the pandemic. At the outset, the focus on the distribution of COVID-19 outbreaks and fatalities highlighted concentration in dense, global, and networked cities (Stier et al., 2021). Publicly collected data, however, was not up to the task of revealing rapid urban changes, including work-from-home patterns and their correlation to office occupancy, variations in mobility patterns, shifts in residential preferences, and the uneven and unequal geographies of disease transmission (Brail and Kleinman, 2022; Coleman et al., 2022). The potential of big data to address these vexing and critical questions loomed large (Poom et al., 2020).
Since March 2020, a range of big data challenges and opportunities related to the COVID-19 pandemic have been raised in this journal and elsewhere (Dong et al., 2022; Leszczynski and Zook, 2020; Walsh, 2023). Scholars have examined the reliance on big data, highlighting questions related to mission creep, surveillance, and intentional obfuscation by public and private entities (Newlands et al., 2020; Shelton, 2020). As we continue to trust, experiment with, and benefit from data to understand urban recovery, including the vibrancy of city life, the return to office (RTO), and transit use, it is imperative to maintain a critical eye.
While scholars, policymakers, industry leaders, civic organizations, and others have become accustomed to depending on novel data, there is a need to delve more deeply into understanding exactly what we are looking at, where our data comes from, how it is organized, what its flaws are, and what value these data might hold.
In this commentary, we discuss four data challenges related to understanding the immediate and long-term impacts of COVID-19 on cities, using examples from Toronto to ground our discussion. First, we explore how public policy shifts and private sector decisions alter data collection priorities, availability, and reliability. Second, we discuss temporal issues, including the speed of data retrieval and frequency of data collection. Third, we examine the rising use of unexpected data sources, which often feature a lack of rigor and consistency yet are frequently relied upon. Fourth, we highlight the elasticity in how cities are defined.
We conclude by noting the tension between rapid, novel data and slow, consistent data. We argue that a deeper appreciation and analysis of, and access to, myriad sources of data are necessary to understand the impact of COVID-19 on cities. These lessons extend beyond the pandemic, contributing to debates regarding the use of big data to understand the challenges facing cities and society.
Moving baselines, disappearing data
Since the onset of the pandemic, it has been exceedingly difficult to identify consistent, reliable urban indicators to assess the impact of COVID-19. Baseline measures and metrics continue to change due to shifts in public policy and the pandemic itself. Early on, analysis focused on understanding where disease was taking hold, who was most vulnerable, and actions to address inequities. Case counts and polymerase chain reaction (PCR) test positivity rates quickly became accepted measures for understanding the spread of the pandemic. These metrics were likely underestimates since they relied on reporting to local, regional, or national authorities. Individuals with COVID-19 did not necessarily encounter consistent guidelines or a formal system to record their case (Klasa et al., 2021). In Toronto, the spread of COVID-19 was highest among immigrants, racialized populations, and other marginalized communities, as well as concentrated in lower-income neighborhoods. Hyper-local strategies were introduced to provide testing in the city's most vulnerable neighborhoods (Yuen, 2020) to limit the spread of disease and provide improved local health surveillance data.
As Dong et al. (2022) suggest, inconsistencies in data standards and guidance complicated data reporting. Over time, it became increasingly difficult to understand rates of infection across cities as local, regional, and national policies and their associated metrics kept shifting. For example, PCR testing rules continued to evolve, including eligibility criteria. Diminished access to and use of PCR tests over time reduced the utility of case counts and test positivity rates. Hospitalizations or measures of hospital or intensive care unit (ICU) capacity were also used to assess the impact of the pandemic across jurisdictions. However, the introduction and uptake of vaccines meant that hospitalization and ICU-related metrics became less useful over time.
Shifts in pandemic-related data reporting abound in other realms. Public agencies and private firms often altered their metrics and indices, confounding efforts to effectively monitor and assess change. For example, the Financial Times’ Stringency Index, created in collaboration with Oxford researchers, evaluated government measures intended to limit viral spread (Klasa et al., 2021). On several occasions, methodological changes were implemented, and the index is not available below the national level, even though these policies were often executed by local governments.
Elsewhere, the commercial real estate firm, Avison Young, developed data products to understand RTO activity. The firm's Vitality Index, an interactive public dashboard, provided real-time estimates of foot traffic in major North American cities, allowing comparisons to pre-pandemic levels of activity (Avison Young, 2021). In Fall 2022, the firm published a revised dashboard with new metrics to assess RTO and downtown vibrancy, making comparisons to earlier data near impossible. Avison Young removed the dashboard from their website in early 2023, indicating plans to release yet another new tool using a different data provider. The disappearance of privately owned and controlled data has been observed elsewhere; for example, Apple and Google have removed or restricted access to mobility data that had been available (Finazzi, 2023).
Public agencies and research institutions have also discontinued collecting and/or publishing data. For example, a globally comprehensive and widely trusted COVID-19 surveillance dashboard developed by Johns Hopkins University ceased operations in March 2023, leaving its public data repositories available for analysts but with no further updates. At the city scale, the Toronto Transit Commission (TTC), the City of Toronto's public transit agency, provided bi-weekly ridership updates during Fall 2021, when ridership was increasing. However, the TTC stopped its regular reporting abruptly around the time that ridership increases slowed or reversed. Public agencies and private firms often have their own agendas and motivations.
Timely data
The second data issue related to understanding the impacts of the COVID-19 pandemic on cities is temporality, including accessing timely information and the frequency of data collection. The COVID-19 landscape evolved quickly during the initial waves of the pandemic, and the lag time in the release of data from public sources, such as monthly labor force statistics, meant they were often not up to the task of providing timely information about immediate urban policy issues like transit use, office occupancy, or working from home. Yet, these more traditional sources of data, often generated by national statistical agencies, are characterized by rigor, consistency, and a history of trustworthiness.
In contrast to public data, data from private sources tends to be available faster and collected more often. Private sources of data are more likely to be available in real time or reported daily or weekly rather than monthly or less frequently. For example, a significant pandemic challenge for localities was the regulation of indoor dining to limit disease transmission. The City of Toronto, like many cities, relaxed zoning and introduced an outdoor patio dining program to support local businesses and re-animate the city. Yet, measuring the impact of these temporary, ad hoc local policy experiments is difficult. One data source that provides immediate insights comes from OpenTable, a digital restaurant reservation platform. OpenTable provides daily reservation information in selected metropolitan regions around the world. OpenTable's data is freely available on their website but with limited information about it. Metropolitan boundary definitions are unknown, and only a calculated index (rather than raw counts) is available, which compares reservations on a given day to the same day in 2019. While allowing comparison to pre-pandemic conditions, it is difficult to integrate with other data, limiting its utility. Analysts face trade-offs between quality and speed of data access, leading them to substitute or enhance data from known public sources with data from private data providers.
Unexpected sources
A lack of public data, such as data collected by established and trusted public agencies, has led to data innovation in which unexpected sources become proxies for understanding urban change. The digitization of almost everything, from transit use and office access to app-based coffee purchasing, enables the collection and analysis of data that help us understand the pandemic and urban recovery.
For instance, office occupancy data demonstrates the lingering, nuanced, and ongoing impacts of the pandemic as office workers have become accustomed to working from home. A range of data sources have become ‘go-to’ sources. But where does RTO data come from? In the USA, data from Kastle Systems, a firm that specializes in office security systems, is used regularly by the media to report office occupancy and weekly shifts across the largest office markets. Their data is derived from the use of building access controls, such as key fobs and digital keys, in 2600 buildings located in 138 cities (Kastle Systems, n.d.). It is not a statistical sample.
Similar RTO data is not systematically available for Canadian cities. Absent such data, the City of Toronto, along with six local Business Improvement Area organizations, supported a private firm to develop an index gauging Toronto's experience with RTO and understand urban recovery. The Strategic Regional Research Alliance's (SRRA) Occupancy Index is derived from interviews with office landlords and tenants, as well as data provided by property managers regarding building access via security cards (Strategic Regional Research Alliance, 2023). This approach does not allow for replication. Notably, the firm took an unexpected (and opportunistic) pivot from transit and congestion consulting to fill a data gap prompted by an immediate urban decision-making need.
RTO data can be inferred from a range of other sources, including big mobility data derived from cell phones. Chapple et al. (2023) analyze cell phone location data collected via apps to measure pandemic recovery in 62 North American cities. In June 2022, the project switched data providers from SafeGraph to Spectus. Changing sources introduced methodological problems due to smaller sample sizes and more limited locational information while also making comparisons between the researchers’ initial and subsequent analyses difficult. Issues around privacy protection, surveillance, and legal risks have meant that cell phone data collected via apps is increasingly viewed as problematic (Time, 2022; Walsh, 2023). Greater awareness regarding privacy and legal risks, alongside changes in the opt-in/out clauses used by cell phone apps and providers motivated by politics and regulatory changes, influences whose data is collected, where, and when; biases introduced by these changes are unknown.
Elsewhere, researchers have purchased cell phone data derived directly from the telecommunications network. Mariotti et al. (2022) utilize data from Telecom Italia Mobile (TIM), Italy's largest telecommunications firm, to assess the growth of remote work in the Lombardy region during the pandemic. While this approach is expensive and labor intensive as the data is received in raw form, concerns about sample size, accuracy, bias, and comparability can be alleviated by acquiring the data directly from telecommunications service providers rather than third parties.
Sandwich and snack purchases became another unexpected data source. Bloomberg News developed the Pret Index—a measure of Pret a Manger's sandwich sales, provided exclusively to Bloomberg (Buckley and Diamond, 2022). Bloomberg reported sandwich consumption to measure urban recovery in location-specific places (e.g. train stations, airports) in New York, London, Paris, and Hong Kong. Remarkably, the Pret Index is used by the UK Office for National Statistics to track pandemic recovery in real time. What does it tell us when a national public agency is reproducing privately collected data?
Spatial scale
Finally, to understand the implications of COVID-19 across cities requires place-based data in which the spatial boundaries are known and clearly defined. Working with spatial data from private entities poses a set of challenges for rigorous, comparative analysis. For example, in examining RTO data, it becomes apparent that the approach taken by many data analytics firms could best be described as a kind of “do-it-yourself” geography.
Looking deeper into the black box of urban geographies, we find many examples where geographies are misleading. For example, Kastle Systems’ data on RTO, regularly reported as a leading, reliable indicator, mixes geographies. Kastle's estimates for New York are based on a sample of 269 Manhattan office buildings, as well as a few buildings outside of the city's political boundaries (Boyle, 2022). This is not unusual. In speaking with another firm that provides RTO data for North American cities, we learned that the number of buildings used to derive estimates varies by geography. In Toronto, RTO figures are calculated based on data from only 70 buildings, both within and outside the city's political boundaries. Furthermore, estimates of downtown RTO include buildings from the central business district and midtown locations. Not only do we not know how the city is defined, but we also do not know how many buildings are used or how they are chosen. This lack of information makes it difficult to compare data from multiple sources or assess its quality, potentially resulting in flawed analyses or poor decision-making.
Another example can be found in examining return-to-transit data. This is even more curious than RTO data because transit agencies are public agencies. Yet, COVID-19 demonstrated that many transit agencies lack transparent, accessible data reporting protocols. For example, multiple municipal agencies provide transit services across the Toronto region, each with their own reporting practices. The only public source for comparing return-to-transit data across Canada's largest metropolitan area is found in one of the smaller municipality's reports—in part, it seems plausible—because that municipality's transit ridership (bus service only) has fully recovered (City of Brampton, 2022). Canada's national statistical agency also collects and reports monthly urban public transit data (Statistics Canada, 2023). However, the data is not available at the municipal or metropolitan level. Instead, urban public transit data is available at the national scale, as well as for three supra-provincial regions, which is decidedly unhelpful if one is interested in understanding transit recovery in cities.
Counting on data
To understand the myriad impacts of the COVID-19 pandemic on cities, we must be able to count on data. Our discussion reveals the challenges related to rapid shifts in the indicators used to understand the evolution of the pandemic, access to timely and high-quality data, and the definition of spatial boundaries, as well as highlighting the increased use of novel and unexpected data sources to inform our understanding of urban recovery.
Overall, analysts face several conundrums in the pursuit of understanding the impact of COVID-19 on cities. Access to data remains an issue. In many of our examples, data is only visualized via dashboards and not available for download in its raw form, limiting its use for further analysis. Efforts—especially at the local level—are often ad hoc and fragmented. And given the knowledge that inequalities have been exacerbated by COVID-19 and data practices (Coleman et al., 2022; Walsh, 2023), it is highly problematic that data is frequently not available with sufficient detail to fully assess the impacts on particular neighborhoods or specific demographic and socio-economic groups. Often, access to data comes at a (high) cost and is governed by end-user licenses or agreements. Such contracts protect commercial interests; allow data access for scholars, policymakers, and communities; and may address privacy concerns. But they also create barriers to access, and the broader issues remain. Public agencies have not been particularly forthcoming in publishing or releasing data in a reliable or accessible way. This may have the unintended consequence of further incentivizing those with means to seek out data from private sources regardless of its quality or reliability. There is little transparency in how data from private sources is gathered, processed, and cleaned. Issues related to representativeness are unaddressed or assumed to be minimal due to the size of datasets. And data are often accepted at face value, with little regard to underlying issues. Despite its shortcomings, policymakers and researchers appear to have learned to work with the data available by triangulating a range of sources in the hopes of building a better understanding of the impacts of the pandemic on cities.
Finally, the broader legislative, regulatory, and political climate in which data is collected shapes what data is available; this is most apparent in the case of cell phone data, where data quality is being affected by broader societal politics and a patchwork of legislation and regulation. In other words, data collection itself is not a neutral proposition.
While devastating in its toll on human life, the COVID-19 pandemic spurred innovation across many realms, including the collection and use of data to understand urban change. Unquestionably, scholars, policymakers, and businesses alike found creative ways to measure and visualize the effects of the pandemic on cities, as well as evaluate the extent to which cities continue to evolve and recover. Notably, recent analyses of urban recovery use more traditional public data sources, which are known to be stable, reliable, and more readily replicated (Bay Area Council Economic Institute, 2023; Irving et al., 2023).
As scholars interested in developing a deep and rigorous understanding of pandemic-induced urban change, we advocate for proceeding with caution and with our eyes wide open to the many issues and limitations associated with the big data being used to analyze and evaluate the impact of the pandemic on cities, as well as other urban challenges. To be responsible urban scholars, greater critical engagement with our data and its sources is necessary.
Footnotes
Acknowledgments
The authors wish to acknowledge the constructive feedback received when we presented an earlier version of this work at a January 2023 workshop titled “The Black Box of Mobility Data: Cities, COVID-19 and Counting” as part of the series Responsible Data Science, DSI@UTM: Data Digests, hosted by the University of Toronto Mississauga Data Sciences Institute. We are also grateful to Matthew Zook and two anonymous reviewers for their careful reading and valuable suggestions.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Social Sciences and Humanities Research Council of Canada and the University of Toronto Mississauga Mobility Network.
