Abstract
This article adapts the ethnographic medium of the diary to develop a method for studying data and related data practices. The article focuses on the creation of one data diary, developed iteratively over three years in the context of a national centre for monitoring disasters and natural hazards in Brazil (Cemaden). We describe four points of focus involved in the creation of a data diary – spaces, interfaces, types and situations – before reflecting on the value of this method. We suggest data diaries (1) are able to capture the
Introduction
This article makes a methodological contribution to the social study of data and specifically to what has been called ‘critical data studies’ (Dalton et al., 2016; Metcalf and Crawford, 2016). Early accounts of the rise and promise and perils of Big Data (Kitchin, 2014a, 2014b; Mayer-Schonberger and Cukier, 2013) have given way to a large number of nuanced discussions around what can broadly be described as the datafication of society (Dijck and van, 2014; Es and Schafer, 2017; Mejias and Couldry, 2019; Sumartojo et al., 2016). Data and their related practices are now roundly understood to shape subjectivities (Cheney-Lippold, 2017; Koopman, 2019) and underpin political economies (Srnicek, 2016; Zuboff, 2019); to have their own political and geo-political dynamics (Arora, 2016; Couldry and Mejias, 2019; Ruppert et al., 2017) and unique cultures of practice (Lupton, 2019; Pink et al., 2017). Countering the early focus on data (and Big Data’s) epistemological qualities, critical data research has stressed data’s everyday or ‘mundane’ status (Pink et al., 2017), their materiality (Bates et al., 2016; Gray et al., 2018), situatedness (Loukissas, 2019), historical contingency (Rosenberg, 2013) and affective qualities (Kennedy and Hill, 2017; Lupton, 2017; Smith, 2018; Sumartojo et al., 2016), among other things.
Within this nascent field, a smaller number of articles have explicitly focused on questions of methodology, implicitly suggesting that critical and socially focused accounts of data require a degree of methodological inventiveness (Lury and Wakeford, 2012). These are not methods for producing data but methods that aim to make available new ways of knowing data. Alison Powell, for example, has developed the method of the ‘data walkshop’ which she describes as A radically bottom up process of exploring and defining data, ‘big data’ and data politics from the perspectives of groups of citizens, who walk, observe, discuss and record connections between data, processes of datafication, and the places that they live in (2018: 2). A growing need for methodological approaches that are able to capture detailed empirical understanding about ‘Big Data’ in practice, including how socio-material factors influence the constitution of data objects and shape how they move through space and time connecting different sites of practice across vast data infrastructures. (2016: 2)
In what follows, we do not present a diary in completed form, but rather provide a reflexive overview of how we went about creating a data diary. Our goal is to offer a general guide for others who wish to study data empirically in a predefined space or organisation, and to make a case for the value of such a method. We begin with an introduction to the site within which the data diary was created – a natural hazard monitoring centre in Brazil. We then offer a more detailed account of the diary as method before narrowing to the data diary. We present four points of focus for the creation of a data diary – data spaces, data interfaces, data types and data situations – giving examples of how each was documented in the diary. By attending to space, interface, type and situation, the idea is that the researcher builds an understanding of what we call ‘data-intensive situations’ gradually, beginning with a general understanding of what the space is for and eventually being able to grasp data’s role in the unfolding of situations (Suchman, 2006). We conclude by offering three ways the production of a data diary contributes to understandings of data: (1) by making visible the
The case: Cemaden
The Brazilian National Centre for Monitoring and Early Warning of Natural Disasters (Cemaden in Portuguese) is located in a technology park on the outskirts of São José dos Campos, a medium-sized city approximately 80 km north-east of São Paulo city. Cemaden was established in December 2011 as a direct response to the landslides and flooding that occurred in the state of Rio de Janeiro in 2010, which resulted in the death of 916 people and left a further 35,000 displaced. Since its establishment, researchers at Cemaden and affiliated organisations have identified 43,000 areas in almost 1000 different municipalities across Brazil which host communities that are vulnerable to severe landslides, flooding, flash-flooding, or ponding. Cemaden uses more than 4750 rainfall gauges, around 550 humidity and rainfall sensors, 9 weather radars, and almost 300 hydrological stations to monitor weather-related events that may affect these vulnerable communities (Horita et al., 2017).
Within the larger Cemaden building sits a fully enclosed situation room and it is in this room that the actual monitoring of weather to identify possible natural hazards that could lead to disasters occurs. The situation room is staffed 24/7 by teams of four to seven specialists, working 6-hour shifts without break. Each team is comprised of different specialisations, including at least one meteorologist, hydrologist, geo-specialist and disaster specialist. The specialists’ roles are designed such that they work semi-independently with clearly differentiated tasks, although we observed a strong tendency to work collaboratively, especially in times of pressure (as discussed below).
The specialists in the situation room have two main tasks. First, they issue warning reports and make decisions about whether or not to issue warning reports. Second, they produce a daily geo-hydrological risk map (see Figure 1). While both activities rely heavily on data, our focus is on the issuing of warning reports.

Geo-hydrological risk prediction map for 20 March 2020.
Within the team, it is often the meteorologist who draws the group’s attention to an area of concern. Changing atmospheric conditions such as the movement of a storm front are the first indicator that a risk of disaster (or ‘situation’) may be present. Once identified, the rest of the specialists draw their attention to the area and monitor it more closely, using numerous visualisations, images and other representations of data from a variety of sources to decide whether or not a warning report needs to be issued. During normal operation, only the hydrologists and geologists issue warning reports.
We consider Cemaden an ideal case to study data within a sociotechnical context. Previous studies have indicated that Cemaden is a notable example of decision-making within a ‘big data’ context (Horita et al., 2017) and also suggests more participatory observations should be conducted in the situation room in order to better understand quotidian data practices (Horita et al., 2018). Finally, without wishing to revisit debates about what constitutes Big Data, in Kitchin’s influential writing on the topic he notes that ‘Big Data has existed in some domains, such as remote sensing, weather prediction, and financial markets, for some time…’ (2014a: 2). As a weather monitoring centre with the majority of data generated through remote sensors, the situation room in Cemaden would appear an ideal site to investigate what Kitchin refers to as the ‘new epistemologies’ enabled by Big Data. While we are less interested in whether or not the epistemological character of activity in the situation room are uniquely attributable to Big Data, we strongly agree with Kitchin’s call for a ‘situated and reflexive’ approach (2014a: 10). Our data diary of Cemaden is an attempt to present data in such a way that their situatedness is prioritised so as to enable more reflexive data epistemologies.
Data diaries
Ethnographic research has at its disposal many forms of writing. Rojer Sanjek has observed that the term ethnography refers to both a process and a product (2013: 59), and one can see this reflected in corresponding forms of writing, or what Karen O’Reilly once broadly categorised as the difference between ‘writing down’ and ‘writing up’ (2005: 175). Methods such as ‘scratch notes’ or observational ‘jottings’ are rough attempts to capture the empirical as it unfolds, while field notes, journals and diaries are typically more reflexive, written post-experience and are organised in different ways (Emerson and Fretz Ri Shaw , 2011; O’Reilly, 2011; Sanjek, 1990, 2013). The forms are many and varied and can be brought into different types of relation with each other and the wider media ecology of ethnographic research.
The diary has its own historical trajectory and position within the spaces of ethnographic writing. While diary and diary-like forms of writing (personal, informal, narrated, etc.) are found across a number of disciplines as well as beyond academia, the diary’s ethnographic significance came to the fore with the posthumous publication of Bronislaw Malinowski’s
Our data diary draws selectively from and repurposes this tradition of ethnographic writing. At the most general methodological level, a data diary is a strategy of notation. Its purpose is to produce an account of data and related data practices within a sociotechnical setting. It is, then, a method uniquely interested in providing an ethnographic account of data, but one in which the role of humans recedes such that data can come into focus. While there may be different types of data diaries, we consider this method to be necessarily informed by critical perspectives on data and related topics (software, interfaces, visualisation, infrastructure, and so on). A data diary aims to operationalise critical insights for methodological ends.
We use the term ‘data diary’ to refer to this general methodological orientation (‘an account of data and data practices’) but also to its final output – in our case, a data diary of Cemaden’s situation room. We do not clearly distinguish between process or method and end product because the steps involved in moving from process to product, of deciding what goes into the ‘product’, how it is to be arranged and presented, is itself still part of the process or ‘method’.
When it comes to actually making a data diary, this general methodological orientation can be realised through a variety of related techniques and indeed, other methods. That is to say, a data diary is composite. For example, our diary was comprised of scratch notes, jottings and more lengthy written reflections, but we also included photographs, excerpts from technical manuals, diagrams, illustrations, wall-posters and a number of slide presentations that were shared with us by Cemaden staff. While a data diary makes use of any number of other techniques and methods, these are all brought under the organising logic and methodological orientation of the data diary.
Our diary was in equal parts written, captured and compiled. It was a collaborative affair, iterated on by a number of researchers over a period of three years (2018–2020). Specialists within the situation room in Cemaden, as well as other researchers and managers in the centre added elements to its construction and provided feedback as it was refined. The diary weaves a path through the disparate knowledges of the researcher, the IT system manager, the situation room manager, the respective specialists, and Cemaden’s researchers who work outside the situation room. As a collaboration, then, at times it blurs the distinction between researcher, participant and their respective domains of expertise. What enables this collaborative blurring is the subject (matter) of the diary.
Diverging from the tendency of the diary genre, a
The process of diary creation is ultimately geared toward a final product, upon which a curated selection of the material gathered and developed are included. The criteria upon which these decisions are made will vary depending on the specific goals of the diary. In our diary, we wondered about how best to visually represent and include human-data interactions, for example. We also wondered whether or not and how to include slides from the IT manager that diagram certain infrastructural elements underpinning the situation room. These are the types of questions we found ourselves discussing when moving from process to product.
Before detailing the process of creation of the Cemaden diary, we wish to add a final point regarding what we see as the onto-epistemological stakes of diary creation. Perhaps more so than other forms, diaries raise the question of the authority of inscription. Historically, diaries may be contrasted to more official documents, such as reports and other bureaucratic documents, and this contrast is equally one of style. The diary may become an important historical document (part of the ‘historical record’, so to speak), but it is typically produced with different authorly intentions. While there are no strict conventions, it may be characterised by more intimate and reflexive passages. A diary may blend the observation and description of events with a sense of how these events were experienced at the level of subjectivity. Without wishing to revisit whether or not a diary ‘reveals some more primordial truth’ (Hutnyk, 1998: 350), we consider data diaries as spaces of epistemological encounter; spaces which enable the creation of emergent truth-values in distinction to more established ways of presenting data. Specifically, a diary can operate across different epistemological registers, bringing together, for example, a formal diagram of decision-making, a sketch of desktop display, and an off-the-cuff remark by a specialist under duress.
A data diary
In what follows, we offer a number of points of focus for the construction of a data diary. We arrived at these iteratively, through reflecting on the material gathered and discussions held during our first few visits to Cemaden. Our four focus areas – data spaces, data interfaces, data types, and data situations – therefore emerged through the process of diary production itself and in relation to the specificity of the case. Other diaries based on other cases could look quite different, though it is hoped that documenting our points of focus provides a useful point of reference. What we offer is a narration of how the diary was constructed, what each point of focus adds conceptually, and examples of what we included.
Data space
Our data diary was produced in a concrete and relatively bounded site. Data, of course, do not respect these boundaries, but our aim was not to follow the data but to better understand what data did in a given context or situation over time. The diary began with our first visit to the room and an initial walkthrough offered by a scientist and former situation room manager, who was also a collaborator on the larger research project within which the creation of the data diary was a part. Upon entering the room, we walk up a broad entrance hallway with a slight incline. After entering, we emerge at what is the rear of the room in terms of orientation. As an initial exercise, we mapped the space of the situation room. The room was sketched and later diagrammed, focusing on the position of sources of visible data. Similar to earlier accounts of how the presence of code is constitutive of the spaces in which it is present – creating an emergent code/space – our method is also interested in understanding how data and space intermingle to create new spatialities (Kitchin and Dodge, 2011). The first task is to sketch the room layout paying attention to the presence of data (see Figure 2).

Sketch of the Cemaden situation room. Source: Authors.
The first thing to note about the room is how it is dominated by the wall-sized screen, or

Photograph of the situation room. Source: Authors.
The room itself is filled with four rows of work benches, with access ways on the far left and far right of the room (to access the front). Each row has five workstations and each of these two monitors, a keyboard and mouse. Besides the
Spatially, the situation room is organised to encourage a specific kind of ‘data gaze’ (Beer, 2018). The whole room is oriented towards the
Focusing on data and space, on where data are, how they move (or not), as well as on the material arrangement and interaction of things and people in relation to data carries on the approach proposed by Kitchin and Dodge in
Taken for granted within this spatial, transductive approach is an understanding of data as material as opposed to somehow purely items of knowledge. Data are entirely dependent on material infrastructures (buildings, electricity, screens, information systems, radars, etc.) but also only exists through specific material instantiations, as things sensed through sensors, passing through many mediations before being displayed through a bundle of screen technologies, hardware and software. In this way, data cannot clearly be separated from their infrastructures. This reality (of data/space) was made bluntly enough during one visit where, during a storm, the power went out in the situation room. Without the
Beginning with a narration of data and space involves a consideration of where data are, but also how they act to constitute space itself. Sketching and diagramming the room is thus only a starting point – a first step in a more processual understanding of spatial dynamics. What such diagrams do make visible are the broad strategies of spatial configuration of the situation room as a data space. We found that sketching the space daily (noting any changes) and complimenting this with other more specific observations (such as changes in the configuration of the
Data interfaces
The interface has long been a privileged object for considering the social aspects of human-machine interactions, from the rise of ergonomics and ‘human factors’ to more recent user experience design and the emerging area of interface criticism (Andersen and Pold, 2011, 2018; Harrison et al., 2007; Hookway, 2014; Suchman, 2006). As early designers of computational systems that support decision making observed, ‘The system, as seen by the users,
Telão.
During our first visit to Cemaden, we were given a brief overview of the

Initial sketch of the
A key observation is that the configuration of the

A diagrammatic comparison of two
A walkthrough of the
Workstations.
While the
Figure 6, for example, is a rough sketch of a typical workstation display. The screen on the left has a web browser window with a cloud-based email client open and other browser tabs. Because it is not a busy day, the screen is used for a range of non-monitoring-related activities. The screen on the right is set up for monitoring. This display also has a browser window visible with several tabs open. Each of the tabs provides access to different sources of data, with some duplicating the data displayed on the

Sketch of a workstation display. Source: Authors.
Attention to data interfaces enables a colouring of how data space is configured. Through documenting the various elements on each display, this aspect of the data diary introduces the main systems and types of data in use in the situation room. Focusing on the display shows how data are represented in the room (largely through maps, charts and tables) and how the use of data varies across roles and between individuals. While we limit our discussion to an introduction to the elements on display and how staff configure their own interfaces, one could go much further with an interface-led inquiry, through analysis the formal qualities of the visualisations, or further unpacking the software or display technologies in use, for example.
Data types
From the interface, we narrow to the data. There are many ways data may feature directly in a data diary and after exploring a number of methods we came to focus on two. The first method we used involved an extension of the walkthrough-like approach used for interfaces, where we invited a specialist to tell us everything they could about a given data type, data point, or data entry (in practice these are not always clearly differentiated). Here, the point is for the walkthrough to be open-ended, such that the specialist (here in the position of the narrator) gives their unique take on the data under consideration. For example, one manager gave us a walkthrough of one data entry on the SIADEN window on the
A second technique we explored was creating survey-like questionnaires for different types of data, which we filled in ourselves through observing and asking questions directly. Figure 7 shows the questions we asked of the same SIADEN data. These questionnaires may be suitable if comparing data types or responses is of interest.

Data Questionnaire for SIADEN warning data. Source: Authors.
All in all, attention to space, interface and type can all be seen as laying the groundwork for understanding data in motion. This is not to say that these things are static or unchanging.
Rather, attention to these first three points of interest prepares the diary maker(s); it equips them with the necessary orientations, data literacies and perceptual foci to give an explicitly processual account of data and data practices. We explore data’s processuality through their constitutive role in situations.
Data situations
The situation room is a data space, which means its spatial character, the
Given the methodological focus of this article, our interest here is limited to recognising and documenting the eventive nature of the room. What methods can make visible how data are put to work, how they are practiced, and specifically how they constitute and help resolve or alternatively trouble the unfolding of a situation? Although constituted through data, a data situation requires a human. Data fill the screens and structure the room, but they cannot speak and they cannot act alone – at least not in Cemaden. To document and narrate data situations, then, one must actively pay attention to the specialists.
The majority of our time in the situation room was dedicated to documenting the unfolding of data practices during a situation. We did this through a combination of observing the dynamics of the room in general and through shadowing individual specialists. Shadowing involved sitting behind or next to a specialist for short spells (typically no more than 30 minutes at a time) and occasionally asking them to verbalise or explain their actions.
While the specialists have individual roles, they generally work in a collaborative manner. It is the meteorologist’s role to identifying weather patterns in need of close monitoring, and the hydrologist’s and meteorologist’s role to make the final decision about issuing a warning alert, however in between these moments a lot of (collaborative) activity may occur. To capture some of this activity, we adopted the ‘sequence’ or ‘event diagram’ method, which is designed to show how actors interact within a given period of time. Figure 8 shows one of many such diagrams we made during the creation of the diary.

Sequence Diagram of a Warning. Source: Authors.
While sometimes situations emerge quickly and clearly, often this was not the case and instead the specialists would monitor data in an ongoing way, piecing things together from different sources and as part of a team discussion. In this case (Figure 8), the situation is already happening. The team of specialists are monitoring a municipality in the state of Minas Gerais. The meteorologist (Met1) has been monitoring atmospheric conditions in this region and has indicated that it has begun raining. We are shadowing the hydrologist (Hid1), who is trying to get a closer look at the region on their workstation using SALVAR (by zooming in to the municipality). While doing this, the meteorologist indicates that pluviometric data (rainfall gauges) has passed 60 mm. The hydrologist is trying to consult rain radar data, but the area of concern falls between two radars, meaning there is no radar data available. This is communicated to the meteorologist, who then indicates that there is also no pre-existing threshold in the area for rainfall (which would help determine if a warning needed to be issued). The meteorologist acknowledges that 60 mm is high, but it falls within a difficult range. This is because recent rainfall is only one relevant measure to indicate a possible risk. Accumulated rainfall, soil type, terrain, population and building density, previous disasters and other things may also need to be factored in. In this case, the hydrologist abandons the radar data and looks at accumulated rainfall (pluviometric data within the previous 24 hours), which is high. She communicates this to the meteorologist and then issues a warning without further discussion.
In this case, the situation emerges slowly, with both the meteorologist and hydrologist monitoring different sources of data in relation to a specific municipality. The 60 mm rain level prompts the meteorologist to speak and the situation intensifies. Importantly, the rainfall gauge data suggests a decision will need to be made but does not provide any certainty about the decision itself. The lack of radar data adds to the uncertainty, while the accumulated rainfall data clarifies the situation. In other cases, we observed, uncertainty would persist for extended periods and with different types of data not aligning towards a clear course of action. (There are many forms of uncertainty stemming from data, from delays in data refresh rates, to competing measures from similar data types, contrasting measures across data types, gaps in data, and measures that hover around thresholds.)
Through this sequence (Figure 8), we can see how data shapes the unfolding of the room’s situation
Through attention to data situations, we also see the everyday and ordinary contexts of data practices. We how specialists informally discuss their data dilemmas, how formal processes and technical systems are themselves transduced into conversations and processes of collaborative decision making. We observed that while data do not speak, they nevertheless have a language and set of gestures through which they achieve their interventions in specific situations. In the example above, data are spoken by the meteorologist and are embodied through the gesture of the hydrologist, who points to the accumulated pluviometric data. Sometimes the specialists are precise with their data-driven utterances, but just as often they are not, relying instead on their shared expertise and situational awareness to allow communicative abbreviations. Often, ‘look’, ‘it’s high’, or ‘nothing’ are all that are needed for the language of data to make an intervention. The production of a data dairy is what allows us to contextualise and document such interventions in their everyday unfolding – to give a situated account of this space of data situations.
Concluding discussion: The methodological affordances of a data diary
In this article, we have suggested that the diary method can be adapted to enrich understandings of data and related data practices. We have made a case for the suitability of the diary method and detailed four points of focus – spaces, interfaces, types and situations – that informed the creation of our own diary. To be clear, our points of focus emerged in relation to the research site itself and should not be taken as necessary or inevitable. Indeed, our own study exceeded them in a number of ways, particularly in relation to questions of infrastructure and the information technology in Cemaden (but outside the situation room), which were also important in our study but were not practical to include. In addition, we have produced complimentary research in related organisational settings connected to Cemaden. This research also informed the study and could be used as part of a larger multi-sited diary (and perhaps as part of a ‘data journey’), but was left aside. This is to say a data diary is more a methodological orientation than a prescription. Attention to space, interface, type and situation enables our research team to build a general understanding of data and then gradually nuance this understanding through an account of data’s situational unfolding. These points of focus were, however, not arbitrary and indeed were impressed upon us through our time in the room.
In general, the value of a method rests upon the contributions to knowledge it is able to facilitate. However, since the focus of this piece has been on the method (and not directly on knowledge), we wish to finish by outlining three ways that the data diary has helped our own research, which may also be useful for others. First, data diaries make
Second, despite a growing literature on the influence of data, there is a real lack of approaches that empirically document how data actually
Third, and finally, it is well-recognised that social science and humanities-derived ways of knowing can make valuable additions to how we understand, use and govern data. However, specifically how these ways of knowing can meaningfully contribute is, in some cases, less clear, especially since such ways of knowing often enter the scene after the fact, like a late guest at a dinner party when everything is already in motion. While the data diary does not pretend divides do not exist, its creation is a collaborative affair based on a mutual affirmation of expertise. The production of the diary is an inter- and transdisciplinary knowledge co-production endeavour (Coaffee, de Albuquerque and Pitidis, 2021), which was conducted alongside other components of a larger research project. The point was not to force anyone into seeing data in a specific way, but to produce a document that reproduces the different ways of seeing and knowing data in the situation room. As a process, the diary offers a space of cross-border knowledge creation, through ongoing dialogue and also through the presentation of the diary at various stages as a work in progress. As a product, data’s informalities and the specificity of its interventions sit next to its more formal modes of representation, and these ways of knowing are able to circulate to different stakeholders beyond the situation room. It is, perhaps, a small thing. But if the humanities and social sciences are to have a voice in spaces where data and more formal knowledges rule, we need ways of making our distinct ways of knowing visible to others. More than this, though, we need ways of making our ways of knowing familiar and at home with other ways of knowing. A data diary offers a step in that direction.
Footnotes
Acknowledgements
We would like to express our special thanks to members of the situation room of Cemaden and to Cemaden’s staff and leadership for their generous sharing of expertise and knowledge. Dr. Mário Henrique da Mata Martins acknowledges the post-doc fellowship, process number 2019/06595-2, São Paulo Research Foundation (FAPESP).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article is part of the project T2S Waterproofing Data which is financially supported by the Belmont Forum and NORFACE Joint Research Programme on Transformations to Sustainability (
), co-funded by DLR/BMBF, ESRC/Global Challenges Research Fund (ES/S006982/1), FAPESP and the European Commission through Horizon 2020.
