Abstract
The era of ‘big data’ studies and computational social science has recently given rise to a number of realignments within and beyond the social sciences, where otherwise distinct data formats – digital, numerical, ethnographic, visual, etc. – rub off and emerge from one another in new ways. This article chronicles the collaboration between a team of anthropologists and sociologists, who worked together for one week in an experimental attempt to combine ‘big’ transactional and ‘small’ ethnographic data formats. Our collaboration is part of a larger cross-disciplinary project carried out at the Danish Technical University (DTU), where high-resolution transactional data from smartphones allows for recordings of social networks amongst a freshman class (N = 800). With a parallel deployment of ethnographic fieldwork among the DTU students, this research set-up raises a number of questions concerning how to assemble disparate ‘data-worlds’ and to what epistemological and political effects? To address these questions, a specific social event – a lively student party – was singled out from the broader DTU dataset. Our experimental collaboration used recordings of Bluetooth signals between students’ phones to visualize the ebb and flow of social intensities at the DTU party, juxtaposing these with ethnographic field-notes on shifting party atmospheres. Tracing and reflecting on the process of combining heterogeneous data, the article offers a concrete case of how a ‘stitching together’ of digital and ethnographic data-worlds might take place.
Introduction: Stitching up heterogeneous data-worlds?
Over the last decade, researchers in anthropology, sociology, human geography, and science and technology studies have engaged with increasing frequency in collaborative and cross-disciplinary endeavours to experiment on new forms of complex data generation, data analysis and data visualization. Such emergence of digital, computational, transactional and otherwise ‘big’ social data has given rise to new realignments, but also new fissures and bifurcations, within, across and indeed beyond the social sciences (boyd and Crawford, 2012). Here, otherwise disparate data formats – transactional, digital, ethnographic, numerical, visual, etc. – have come to rub off and emerge from one another. This situation has resulted in new challenges and opportunities for doing reflexive as well as socially and ethically grounded research in an age of ‘big and broad’ social data (Housley et al., 2014).
Within the field of anthropology in particular, new debates have emerged on the mutual hostility or possible interdependence of ‘small’ ethnographic and ‘big’ transactional data formats and worlds (e.g., Abramson, 2016; Knox and Walford, 2016; Kockelman, 2013; Nafus and Sherman, 2013; Stoller, 2013). Importantly, a range of innovative concrete examples and reflections on collaborative data practices, not least set in joint industry–academia contexts, inform such debates (e.g., Boellstorff et al., 2015; Curran, 2013; Ford, 2014; Taylor and Horst, 2013). Among other things, these debates and endeavours may be said to raise the following pertinent questions, at once methodological, epistemological and ethical, and still very much open-ended and unresolved: how to combine or mix otherwise disparate datasets, formats and worlds? Can and should different bodies of data originating from apparently incommensurable research endeavours and arenas be ‘added up’ into one another? At issue here, fundamentally, is the question of how much is shared and how much is different across ‘big’ transactional and ‘small’ ethnographic datasets, and under what conditions they can be made to interfere productively with each other?
In this article, we follow up on a programmatic intervention previously published in this journal (Blok and Pedersen, 2014) to explore in more detail this space of ethnographic-cum-digital data as one of mutual dependency or ‘complementarity’. We do so in our role as participants in the Copenhagen Social Networks Study, a cross-disciplinary ‘computational social science’ (Lazer et al., 2009) experiment in which computer scientists, physicists, psychologists, economists, philosophers, sociologists and anthropologists collaborate to study social interactions among a cohort of freshman students (N = 800) at the DTU (Sekara et al., 2016; Stopczynski et al., 2014). In this project, smartphones distributed to students as measurement devices record ‘big’ digital (meta-)data on social transactions from call and SMS logs, Bluetooth, GPS geo-location and other channels. 1 Simultaneously, but in parallel from this digital infrastructure, an anthropologist (My, co-author of this article) has conducted long-term fieldwork among the same DTU students, producing ‘thick data’ (Wang, 2013) on friendship and other social relations.
Over the following pages, we chronicle a weeklong collaborative exploration undertaken by six anthropologists and sociologists (the authors of this article), forming the core of what we call the Critical Algorithms Lab (CALL). 2 For the collaboration, we singled out a specific social event amongst the DTU cohort – a so-called after ski party – as a germane test site for exploring new ways of interrelating radically different data formats, in ways that might leverage new analytic capacities. More specifically, we sought to mix ‘big’ digital and ‘small’ ethnographic data in the form of those Bluetooth recordings and field-notes, respectively, that are otherwise unrelated traces of DTU social life in our overall study set-up. 3 In doing so, ideas of exploratory data analysis (Tukey, 1977), ethno-mining (Anderson et al., 2009), quali-quantitative methods (Latour et al., 2012) and ‘high-resolution’ (or ‘deep’) data experiments (Sekara et al., 2016) all inform our attempt to fertilize new questions – and new modes of criticism – to ask to and from an otherwise often more restricted computational social science. In short, the ambition was and is to explore the steps needed to ‘stitch together’ the heterogeneous social data-worlds of ethnography and computation in new, complementary and hence non-hierarchically productive ways.
From a computational point of view, a single student party may seem like a ‘frivolous’ object of inquiry; yet, the importance of parties as ritual occasions of collective life is deeply ingrained in sociological and anthropological traditions, from Durkheim to Goffman and beyond (Ronen, 2010; Wynn, 2016). Moreover, as we intend to show, the methodological and ethical ramifications of our approach are equally serious. How are we to stimulate (and simulate) ethnographic experience when working with digital databases, and vice versa, in ways that allow us to stitch together – that is, to jointly deploy, navigate, visualize and interfere with – different scales and dimensions of data? To address this question, we take the ebb and flow of social intensities and atmospheres at the DTU party as an object of inquiry equally amenable to, but also equally challenging for, the deployment of ethnographic field-notes and Bluetooth signals, respectively. By chronicling in some detail the steps of the resulting explorative process, our hope is to provide a description of how a stitching together of digital (meta-)data and ethnographic data concretely works. 4 As such, the article is intended to work like an annotated experimental protocol – a quali-quantitative dossier that recounts and reflects upon the generation of digital-cum-ethnographic data and insights.
By investigating the different perspectives that granular behavioural trace data and algorithms afford us, we seek in what follows to accord due attention to the practical challenges and excitement of attempts to work across, put together, and realign radically different methods within a data-dense collaborative research setting. Also, by describing the practical tasks of giving visual (and other) shape to disparate data formats, we hope to suggest that these tasks are indeed integral to and at the core of debates on the contemporary performance of description, analysis, interpretation and critique in the social sciences writ large. So, instead of merely adding pieces of data from one arena to another in accordance with an ‘ad and stir’ logic – arguably still a dominant approach to much interdisciplinary work – we here explore practices of stitching as a possibility of mutual fertilization across apparently incommensurable fields. As a premise, therefore, we recognize that data (no matter how presumptively ‘raw’) is always marked by the histories, paradigms and ethical concerns of specific research arenas (Gitelman, 2013). Stitching heterogeneous data together is then more than just trying to convert the formats of data to work with each other. Rather, it is a matter of making different data-worlds.
The article proceeds as follows: heeding Latour’s (2005: 27) call to always start one’s investigation ‘in the middle of things’, the next section sets the stage for our heterogeneous ‘data-party’ and clarifies its status as a methodological testing ground. From this, we narrate and re-enact three analytical moves through which we explored the stitching together of data-worlds and reflect on their significance. First, our initial attempt at ‘locating the party’ speaks to issues of (re-)contextualization in ‘big’ social data analysis. Second, we relate how our big data-cum-ethnographic exploration ended up co-constituting ‘the social’ in specific terms of relations, quantities and intensities. And third, we show how iterative processes of data visualization and interpretation, as integral methodological steps, came to lend themselves to analytical processes of de- and re-composing distinct ‘temporal motifs’ of party socialities. We conclude by reflecting on how our collaboration may suggest wider opportunities for ‘experimentalizing’ (Marres, 2013) the current big social data moment.
The heterogeneous party: Computation, ethnography and complementarity
Figure 1 shows an anthropologist departing from a party in a taxi following one of many such late night events, a characteristic feature of her field site. The site in question is the DTU, located 10 miles outside of Copenhagen and home to thousands of engineering students of various hues. The anthropologist is My, co-author of this text (but also a big social data subject who has agreed to share this data fragment with the rest of us). In technical terms, the numbers represent digital traces from the Bluetooth scan of My’s tailor-made mobile phone, giving us the number of fellow DTU students, over 5 minute intervals, who are within 10 m of her phone’s reach at any given time. By extrapolation, the present numbers tell us that My is leaving the party by cab with two fellow students, going back to Copenhagen. We know this for sure, because My confirms this interpretation as correct. Taking the bus, she says, can be a pain at this hour.
The anthropologist as numbers.
While the example may be trivial, what follows explores the indeterminate space thus already hinted in Figure 1, the space between digital–numerical inscriptions and ethnographic experiences of the social relations manifested in a DTU party. In standard everyday life, of course, ordinary Danish citizens or indeed researchers like us would not have access to such digital (meta-)data, since they would be the private property of some mobile phone company. As noted, however, My is partaking, along with approximately 800 freshmen DTU students, in a large cross- and multidisciplinary computational social science experiment, the Copenhagen Social Networks Study (see Acknowledgements section). Since 2012, a vast social data infrastructure or, in the hyperbolic terms of Duncan Watts (2013), a ‘social super-collider’, has been build up at DTU, pumping digital communication traces from students’ mobile phones into a secure, anonymized database. Meanwhile, My has been conducting her ethnographic work among the same students, a position that has brought her into collaborative relations with the data scientists responsible for calibrating the database (Madsen et al., forthcoming) but seldom into close contact with the digital data as such.
In a previous publication, we have argued (Blok and Pedersen, 2014) for the relevance of viewing the relationship between ‘small’ ethnographic and ‘big’ computational data-worlds as ‘complementary’. Famously, complementarity was the term used by Danish physicist Niels Bohr to describe the inability – expressed most poignantly in the well-known particle/wave duality of light – of providing, from within any single frame of reference, a full account of the sum total of possible experiences of a given object of study. This line of thinking, we argued, allows us to engage in more open-ended and cross-cutting explorations at emerging social data science frontiers, in ways that avoid both the scientistic presumptions of computational social science proclamations (Lazer et al., 2009) and too-quick assertions of smooth navigation through digital datascapes (Latour et al., 2012), without relapsing into pure critiques of ‘big data ideology’ or mere reassertions of the value of ‘small’ ethnographic data (see Blok and Pedersen, 2014).
The present article draws on this programmatic argument, as noted, by way of setting up a practical experiment in ‘big data-ethnography’, where the hyphen suggests an initial, inherent and deliberate uncertainty as to which of the two terms (big data, ethnography) is here context for the other; which is figure and which is ground. Attempting to explore the dynamic intensities and atmospheres of a specific DTU student party facilitates this experiment, we argue, because these collective properties of a party as a ‘meso-social’ (Wynn, 2016) occasion of bodily co-presence among many participants, with many different forms of engagement, are both real yet elusive to get at via either digital-transactional or ethnographic means. In setting up this space of methodological experimentation across standard quantitative–qualitative divides, we thus seek to recast and conceptually challenge a number of more-or-less established conventions concerning not only digital transactional data and their different social affordances (e.g., Ruppert et al., 2013) but also, and importantly, about the epistemic and critical capacities of ethnography itself (e.g., boyd and Crawford, 2012).
Many social data researchers and, more generally, quantitative social scientists are inclined to see the value of ‘big and broad’ social data as one of enabling a focus on large-scale populations over long time spans (e.g., Housley et al., 2014). Indeed, one dominant assumption in the emerging field of social data science often seems to be that ‘invariance over time’ equals objectivity. Yet, what we attempt to do in our own big data-ethnography experiment is just the reverse: namely, to zoom in on a very limited spatio-temporal data point, namely ‘the party’, with the ambition to extract as much ethnographic and digital detail as possible. After all, a key analytical affordance of ‘big’ social data is exactly its unprecedented granularity; the density or thickness of its observations across space and time (Ruppert et al., 2013). What we wish to do is to explore what such granularity consists of; what kind of granularity it is, relative to ethnographic observations; and, most importantly, what novel kinds of granularities or data densities we might get when attempting to stitch together different digital-transactional and qualitative-ethnographic observations.
This approach of ours shares its overall orientation, as noted, with like-minded attempts to work across the ethnography–big data divide (e.g., Curran, 2013; Ford, 2014; Taylor and Horst, 2013) – while also, we believe, adding notions of complementarity, stitching and granularity as novel methodological orientations for what that might entail in practice. Moreover, the particularities of our data test site mean that we inscribe this exploration into the wider research field of college parties (e.g., Ronen, 2010; Sweeney, 2014), understood here in the Goffmanian sense of a relatively dense, evanescent yet semi-coordinated social occasion of bodily co-presence (Goffman, 1963; Wynn, 2016). While our approach gives us little to say on such otherwise important topics as gender and sexual relations (e.g., Tye and Powers, 1999) or alcohol consumption (e.g., Workman, 2001), we thus conceive of our joint big data-ethnography focus on party intensities as aligning closely with Goffman’s Durkheim-inspired interest in the social forms and affective-normative patterns of interaction rituals.
In short, the experiment on which we report in the following deploys the scale of ‘the granular’ as a promising and challenging meeting ground for big social data analytics and ethnographic observation, where the two ‘observation devices’ have to meet in an understanding of specific social practices and a specific social occasion (the party), as opposed to general tendencies. To get started, however, we first need to delineate the overarching context of this experimental test site of ours; a process that already puts us on the trail of some rather vexed issues of data-worlding.
Locating the party: Searching through and re-contextualizing data
My-the-ethnographer, as the only person present in the group of people conducting the experiment, attended the original party at DTU in person. She knows when the party took place, roughly how many people attended, what the atmosphere was like, when it was ‘dead’ and when it was ‘happening’, that is, its shifting social intensities. Her phone, to some extent, ‘knows’ this as well.
Locating My-the-number from the digital trace left by her mobile phone’s sensor however turns out to be more difficult than anticipated. Not only are we, of course, treading on territory of potential ethical concern: for sound research ethical reasons, the whole data infrastructure has been set up to discourage and disable this kind of personal (re-)identification within the sea of numbers. By implication, it takes a great deal of work (and convincing, on our part) before our DTU colleagues succeed to ‘re-personify’ My’s de-personified digital footprint. Indeed, it turns out that My has been partly erased from the database – out of concern, on the part of our more ‘hard-nosed’ colleagues in the health sciences, that her digital presence in the student population as an embedded ethnographer would ‘pollute’ the data and reduce or even destroy its validity.Old tropes from the philosophy of science about objectivity as de-subjectivation (Daston, 1992) thus come to be re-enacted within our computational context, awash as it is with disciplinary asymmetries and biases, but also with highly important ethical concerns about the nature of privacy and researchers’ accountability in the age of big social data (see boyd and Crawford, 2012: 671ff).
By collecting My-the-number’s traces from every other participant in the overall DTU experiment, we do manage to recreate My-the-number in the database. From this moment onwards, our task becomes a matter of trying out different network mathematical algorithms in order to digitally locate the party (see Appendix 1). By using Bluetooth signals to compute network components – corresponding to ‘gatherings’ in Goffman’s terminology (Wynn, 2016: 278) – in the entire DTU population for the relevant time span of the party, and by calibrating this against digital My as our one known partygoer, we end up with a party which at its peak had 57 ‘dedicated’ participants staying for at least some hours. One hundred and twenty-four students in total attended at some point during the night, coming and going; or, more precisely, our data traces 124 student-phone hybrids, with the ‘real’ number of participants subject to uncertainties that are hard to gauge precisely. Still, in digital data terms, this seems as close as we can get to delineate our test site. Sitting around our computer screens (see Figure 2), we experience a moment of exaltation: we have found the party!
Locating the party in and via code.
How to best theorize what took place during this initial attempt to locate the party? According to Richard Rogers (2013), ‘search’ has become a core trope in the current digital age – even as he also caution us to realize that what one finds depends entirely on the quality of one’s queries. In a related vein, Latour et al. (2012) argue that key questions of social ontology depend on the manner in which we move around in digital datasets. However, valuable as they are, none of these arguments quite address the difficult question of what ‘search’ might mean once the analysis begins to move across very heterogeneous data formats and worlds, which go far beyond the remit of the digital per se. This is precisely the challenge we faced in trying to locate ‘the party’ in the DTU dataset – the question of how to combine, even at this rudimentary level of social inquiry, seemingly radically different sets of quantitative digital and qualitative ethnographic data? To maintain control of such shifts in data visions involves great precision at all points. This is why it was crucial for us to begin our experiment by locating My’s data double: only such a procedure would enable us to digitally reconstruct, using her phone data as calibration device, the social topography of the party as what we might call an ‘interpretable occassion’ in the database.
Taking this experience as cue, and after some deliberation and a good deal of frustration, we decide to make this very oscillation between two radically different datasets used to ground and contextualize one another a key principle of further exploration. We also decide to base our attempts to ‘stitch’ data formats together (My’s field-notes, Bluetooth signals) on a similar sensitivity to the uncertainties and oscillations thereby temporarily frozen into shifting ‘data collages’.
With such aims in view, we initially start to simply ‘look around’ in the digital party data. How close are people to one another, how many are co-present at a given time, what seems to be key temporal dynamics and where at any given time is the ‘core’ of the party, that is, its particular and shifting pattern of socio-spatial centrality and distribution as an occasion (see Wynn, 2016: 280)? Specifically, this means writing programs that transform some chunk of data into tables, graphs and other quantified outputs. In the process, we specify what to count, who to count, i.e. which sub-population, and how to count it, that is how to weigh, normalize and aggregate data. Changing specifications iteratively makes us ‘see’ very different parties, different socialities. Key here is the practical realization, shared with the so-called ethno-miners (Anderson et al., 2009), that we have to build and rebuild ‘context(s)’ in order for our various digital data abstractions to be even minimally interpretable – including, as we saw, the very context of our study as a whole (‘the party’).
It should now be clear how and why our attempt to locate and delineate the party in the database turned out to be a defining task that spoke directly to the core of our concern with stitching together heterogeneous data. Not only did we need My-the-ethnographer as calibration device to even get us near ‘the ground’ of our experimental dataset – or what our physicist peers sometimes like to call ‘ground truthing’. More to the point, this seemingly simple and even banal task turned out to involve multiple questions of inference and interpretation. Such questions, in turn, point to the way in which our digital data allow for great plasticity in the way one might assemble and reassemble them. Indeed, as we shall now see by returning to the account of our data party experiment, this ‘re-composability’ of data simultaneously works as a constant challenge and affords new insights at many levels.
Densities and intensities: Reading the social flow via data/ethnography
We are still waiting for Ulli and now bored with beer pong. Talking. Then Ulli calls: he is here but he can’t see us. He is inside Diagonalen. We meet and go stand by the bar. More people have arrived. There is a whole pile of jackets on top of ours now. People are standing in clusters or squeezed together in the sofas. There are many bottles.
The room is packed now and very noisy. Everything seems strange and unreal. We dance around in our little corner. I throw myself on the bar to reach the computer and change the music. From this position I look back into the room. It’s packed to its limits. […] The coloured lights are flicking, fluttering about on faces and limbs as people and rays are moving.
5
Gradually, three strategies for how to get at such social intensities crystalize from our probing of the digital data: one largely ‘inductive’, the other ‘deductive’ and the third ‘alienating’. Let us consider these in turn. Since our starting point is a personalized ethnographic story (My’s), it initially seems obvious to us to begin our simulation by ‘inductively’ tracking other individual partygoers through time and to visualize their ‘intensity profiles’ with a view to generating a model for what the party would look like if ‘seen’ from the point of view of different such student data-doubles. In turn, we wonder, what might be learned about the party’s varying degrees of intensity by utilizing survey-based individual attributes such as age, gender, personality, etc. to build up types of individual experiences? In Goffmanian terms, this is akin to starting from the individual participants and then work towards assembling various patterns of their co-present partying activities (Wynn, 2016: 282).
As a result of this inductive approach, we eventually come to identify different ‘characters’ and ‘stories’, understood as individual-level patterns that all partake in the making of ‘the party’ as a collective event. However, this approach then raises the question, well known from other contexts, as to which of these characters we are to focus on to best gauge the intensity or atmosphere at the party in general? Should we ‘cast’ students based on known characteristics (e.g., from the survey data), or should we rather do it on the basis of post hoc back-projection (e.g., that they turned out to be ‘popular’ at the party)? We realize that our task is akin to determining the narrative principles of a book, film or indeed any social plot (cf. Latour, 2005). Every time a new piece of information is added, we need to rethink the story as a whole.
Sensing a potential dead end, we decide to shift to a ‘deductive’ register. Here, we work first towards an aggregate picture of party intensity – concretely, a measure of the total number of people and their physical proximity over time (to be discussed over the next pages) – and then later turn towards ‘zooming in’ on instances and relations that seem particularly interesting from this aggregate viewpoint (next section). As such, we imagine the process as a gradual un- and refolding of aggregate data-points, whereby a more reliable interpretive ground may come to be established on which to base our reading of the atmosphere of the party. We begin this work by counting and calibrating how many phones ‘see’ My’s phone in the course of the party, giving us a rough index of its ‘proximity degrees’ by showing how many students she is near at any given point in time (Figure 3; see also Appendix 3).
The anthropologist at work, partying.
Whereas Figure 3 is a quantified depiction of a personalized view of party relations, namely those of My – the ‘ego-network’, in social network analysis parlance – Figure 4 shows a numerical depiction of the aggregate dynamics of the party, calculated as the size of the biggest gathering of people (or: people-phones) at the party over time (see Appendix 3 for details). This latter depiction, we decide, is a possible indicator of the party’s degree of ‘party-ness’ or intensity – if, that is, we accept for a moment to measure a group’s social intensity by simply counting how many people comprise its co-present, bodily gathering at any given time. Put crudely, this was indeed how Durkheim ([1915]1995) suggested to think of the ‘collective effervescence’ of religious and other rituals. Following Goffman, we may well think of active commitments to multi-body co-presence as a similarly important if also rather crude property of student parties as social occasions (see Wynn, 2016: 281).
Size of the largest gathering over time – evolution of ‘party-ness’.
Specifically, Figure 4 suggests that, sometime around 7 pm, the party split into two or more sub-parties only to reunite and eventually peak at around 11 pm – corresponding, not coincidentally, to the ‘social height’ of 57 dedicated participants from which we demarcated the party in the first place. And indeed, checking her field-notes, My confirms that at some point between 10 and 11 pm she did ask herself (and dotted down into her pocket notebook): ‘where is everybody?’ Merging Figures 3 and 4 thus affirms that My, as is to be expected from a devoted ethnographer, at all times of the party did remain within close proximity to, or at the heart of, the largest gathering or core party component! Until, at least, around 1 am, when she decided to go smoking and then leave the party, taking the taxi back home.
Still, as hinted, it is of course questionable just how well the purely numerical size of a given ritual occasion correlates with its intensity or indeed atmosphere (after all, who has not attended a big party that nonetheless never really took off?). We therefore decide to explore further how the two variables (size, intensity) relate and how we might better trace and visualize this correlation over time. To do this, we take advantage of the fact that the Bluetooth readings in the database also provide us with a signal strength (known as RSSI); this signal strength in turn provides us with a (fairly) reliable measure of bodily proximity (Sekara and Lehmann, 2014). However, calculating a simple RSSI average across all Bluetooth connections will not work: since the sensor reads all phones within a 10 m radius, as more people gather in the same room, the average proximity index goes down. We need to counter this technical artefact for the measure to tell us anything interesting.
After some calibration work, we come up with a new possible specification of intensity: for each student, we keep only the RSSI rating that specifies her or his relationship vis-à-vis the closest ‘partner’ – i.e. the one other person (person-phone) physically nearest to her or him – into the aggregate pool of proximity data points. In other words, we construct an index of the average bilateral proximity for the party as a whole: how close are the students, on average, to their ‘nearest other’ within 5 minute intervals during the party? Figure 5 depicts this measure of average bilateral proximity (or ‘intimacy’) over time along with corresponding variations in the size of the largest party component (i.e. Figure 4), equalized to the same scale.
Largest party component size and average bilateral proximity over time.
On the face of it, judging from simple visual clues, the figure confirms our initial ‘Durkheimian’ hunch (as filtered through the more general Goffmanian attention to occasions such as college parties). The two graphs indeed fluctuate in tandem, suggesting that the average bodily proximity of partygoers increases with the total number of co-present party-people (and vice versa), thus possibly indicating something like collective effervescence or at least a certain intensification of the many smaller encounters that make up the party. This correlation is most visible from 11 pm onwards, confirming how My’s field-notes – as discussed above – indicated this as the ‘peak’ of the party, its zenith of proximity, noise and ‘party-ness’. This time around, the ethnographic data grounds our sense of computational advance: the digital data has started ‘behaving’ in ways that strike us as credible, in light of the field-notes.
Yet, Figure 5 also exhibits a strange outlier pattern: the peak in average bilateral proximity that suddenly appears just after 2 am initially makes little sense to us. In particular, if we read the last peak as a moment of hugs and goodbyes, then what about the marked drop in average proximity readings just prior to this? We start speculating: are we witnessing here an expression of flawed data? Alternatively, how can we imagine the unfolding of the party in such a way as to make this outlier pattern meaningful? We try the following narrative: upon leaving the party venue, the students walked down a wide and long corridor, with little proximity to each other, until reaching the exit and briefly regrouping. Knowing the modernist architectural layout of DTU, this hypothesis seems plausible enough. Yet, it is not until we share it with My, the ethnographer, that it gains full traction: indeed, My confirms to the rest of us, there was a wide and long corridor at the end of which people realigned, hugging goodbye! We spontaneously applaud, sensing the joys of data stitching.
Unlike critiques levelled at the ‘randomness’ that may be entailed in purely exploratory approaches to quantitative data (e.g., Gelman and Loken, 2013; see also Tukey, 1977), the inquiry into party intensities just recounted is thus iterative and explicitly interpretative in style.Like others working across the ethnography-digital data divide (e.g., Anderson et al., 2009; Ford, 2014), our attempt to discover, stabilize and interpret emerging patterns of sociality in the digital-transactional data is centrally informed at key junctures by ethnographically imbued insights. Indeed, in the above example, we might be said to ‘asymmetrically’ trust the ethnographic description over the behavioural traces, in part because My’s first-hand experience gives us access to conditioning factors of the occasion – such as the importance of the physical, built space – which is not easily available to us from the Bluetooth traces. 6 The picture of ‘party-ness’ gradually stitched together, however, arguably enjoys new interpretive qualities, reducible neither to the ethnographic nor the digital as such.
De-aggregating the party: Visualizing data, recomposing socialities
People are standing in the couches dancing. There is a bear with a big head and a skinny human body jumping up and down in a couch. Next to him are two animal bodies with human heads poking out of the throat. They are dancing. On the floor some people are trying to dance a choreography that goes with the song playing. Other people are stumbling around between them corrupting their dance.
This marks a new and decisive moment in our experiment: with a focus on spatio-temporal shifts in party forms and intensities, we find ourselves having to narrate the data anew to create a meaningful picture and sequence of micro-events. By combining My’s sharing of her ethnographic knowledge with multiple re-iterated data visualizations, we train ourselves as a ‘quali-quantitative’ (Latour et al., 2012) interpretive device. Through repeated attempts to forge links between our qualitative and our quantitative data registers, we are gradually beginning to get (and to feel that we have) a sense of the party that enables us to ‘see’ sensible patterns of social activity in the numerical distributions at still more granular levels.
Crucially, this ‘sense of the party’ not only relies on becoming familiar with partying as concrete social practice (the ‘behaviour’ of the students); it also, and as importantly, involves for us to become familiar with the ‘behaviour’ of the ‘party as data’. Here, echoing a key theme of digital data analysis (e.g., Ruppert et al., 2013), working in a visual mode becomes central to us. Indeed, visualization techniques play a decisive role in how we query our data. In particular, it allows us to explore social–morphological patterns below the level of the aggregate party group that has been our focus so far. What happens if we imagine our ‘point of view’ not as that of individual students, nor as an aggregate ‘view from nowhere’, but as hovering closely above the party, almost like a camera, recording not single traits but shifting relational patterns? This, of course, is the starting point of network analysis, by which the party maps out as a relational topography of encounters with varying degrees of connectivity.
Figure 6 aggregates all physical proximities of less than approximately 2 m (as measured via Bluetooth signal strength, RSSI) between the 124 party participants over the time span from 8 pm until 2 am. This provides the first in a series of shifting perspectives on the concrete morphology of the party, taking advantage of the fact that digital transactional data is above all traces of social relations (Housley et al., 2014). We use a network visualization tool (Gephi) to generate a force-directed graph which ‘pulls’ more closely connected students – that is, students who have had more close bodily meetings during the party – towards each other, while non-connected students are pushed to the periphery (Jacomy et al., 2014). Students with many close encounters (higher degree) show up as bigger nodes, while colours indicate (modularity-based) clusters of higher-than-average connected students (see Appendix 4 for details).
The party as a relational topography.
A pattern begins to emerge: our network visualization turns out to corroborate what we had already sensed from My’s field-notes, namely that the party should perhaps not be described as one coherent object, but rather as a conglomerate of smaller, interconnected parties. Some of these sub-parties, presumably set in adjacent venues across campus, seem small and short-lived; students likely entered each of these parties within the party, stayed a bit, and then departed again quickly, leaving only limited data traces (neither ethnographically nor digitally). However, note that the more inter-connected (middle) part of the graph is divided into discrete subgroupings, and that two larger sub-gatherings (at each end) stand out clearly. This suggests that a split party morphology had temporal endurance as well and indeed seem to have shaped its overall pattern as an occasion (Wynn, 2016: 280).
At this level of temporal aggregation, it is hard to know what to make of this: is the party under investigation actually one of several sub-parties that never fully merged into a single occasion (or collective ritual); or did groups come and go during the evening, only briefly ‘scanning’ each other? As we de- and recompose data, we are at the same time de- and recomposing our view of party socialities.
Figure 7 adds study-line information to the sub-party network, as an important source of local group identification. Among other things, it suggests that the biotechnology students (in turquoise) – one of the few study-lines at the DTU with an equal share of men and women – were basically having their own party, presumably in one area of the party venue (itself, we know from My, a rather large space allowing for socially negotiated boundaries or zones to emerge). Another noteworthy sub-party is recognizable at the other end of the graph (in dark pink and green), where physics/nanotech students have teamed up with the geo- and space-technology students. In-between these two subgroupings, the largest gathering is composed of (primarily male) students from across study-lines – preoccupied, we assume, with those intense forms of bodily proximity between relative strangers (dancing, drinking, smoking, etc.) so characteristic of how parties may serve to reinforce collective identifications at universities across the world (e.g., Sweeney, 2014).
The sub-party morphology according to study-lines.
Further zooming in and re-aggregating our data, we pick out a sub-party and start exploring how its participating students position themselves vis-à-vis each other at ever-finer levels of spatial and temporal granularity. In other words, moving away from the abstract relationality of nodes and edges depicted in our social networks, we now try instead to ‘mimic’ the actual forms of the social micro-gatherings themselves. This seems both intuitively right and in accordance with My’s observations: after all, standing close to one another at the bar is potentially very different from being in close physical proximity with a dancing partner on the dance floor, in terms of the kinds of social intensities involved and their interpersonal meanings. What emerges from heat-graph visualizations of 5 min Bluetooth intervals is a definite sense of ‘temporal motifs’ (Kovanen et al., 2013) that make up shifting party socialities on the ground: students forming recognizable if abstract shapes and forms of togetherness, as they bundle and re-bundle with the unfolding of the party (Figure 8).
Temporal motifs of party sociality.
These temporalized shapes, we speculate, may well be about as close as our computational data allows us to come to the ‘elementary forms’ of party sociality; the very stuff, that is to say, out of which social atmospheres are made and remade during this kind of interaction ritual (Goffman, 1963; Wynn, 2016). Yet, it remains hard to know what meaning to ascribe to these specific shapes: what is the party atmosphere like from inside a ‘stick’ pattern (first picture), compared to when the party reconfigures into a ‘dotted’ form (the last picture)? How to translate between such abstract motifs of sociality derived from the digital data and My’s rich and experience-near yet also somewhat ‘shape-less’ observations? Here, we seem to be facing an inherently speculative horizon of research where the basic social forms emerging from the digital data are yet to have any recognizable ethnographic name – a limit point at once the most novel and the shakiest moment in our quali-quantitative experiment.
These reflections cause us to pose what we previously dubbed an ‘alienating’ strategy of analysis: namely to explore whether it is at all possible to transcend the apparent non-commensurability between digital and ethnographic data-worlds? In a final gesture to the spirit of abiding experimentation, we decide to seek for ways of getting around incommensurability by transposing the totality of our stitched-together ‘collage’ of heterogeneous digital-cum-ethnographic datasets into a third medium of analysis equally unfamiliar to both of these data-worlds, that of sound. After all, much as with the experience of an atmosphere, music is an inherently emergent, affective and ‘durational’ phenomenon, different from a mere temporal aggregation. Perhaps, we ask ourselves, it might be possible to reveal certain otherwise ‘invisible’ temporal motifs in our dataset by converting key data correlations rendered visible in the above figures and graphs from the medium of the visible to the medium of the audible?
Going with this idea, we decide to push our experimental research into something resembling perhaps conceptual art, by converting the above density graph (Figure 5) into elementary musical forms and by playing this alongside animated versions of the temporal motifs (Figure 8). 7 As any listener to this peculiar ‘sound of the data party’ will presumably agree, this data enactment is not in any sense ‘like’ attending a party in person as either an ethnographer or a student. And yet, the ebb and flow of tonal pitches arguably still conjures a sense of ‘collective atmosphere’ not easily graspable from either the digital or the ethnographic as such. Hence, we leave it standing here to symbolize the mutual destabilization of these two data-worlds and the prospect of further exploring their mutual interferences in the future.
Concluding remarks: Experimentalizing the big social data moment?
Big social data and ethnography, Curran (2013) has argued, have a great deal of similarities. Both are interested in everyday life, in the patterns of sociality, in bodies’ interaction in space, and in a holistic approach to analyses of the present world (p. 70). In this article, a student party at the DTU – a university host to both a computational social science data infrastructure and longer term ethnographic fieldwork – has served as our testing ground for extending Curran’s observation. Reworking insights from existing attempts to explore this potentially converging methodological space, including those of ethno-mining (Anderson et al., 2009) and quali-quantitative approaches (Latour et al., 2012), we have attempted to show how ethnographic and digital-transactional data might in practice be stitched together in complementary (Blok and Pedersen, 2014), non-hierarchical ways that jointly leverages analytic capacities for studying social occasions (Wynn, 2016).
In doing so, our approach both corroborates and extends existing attempts to practically explore the interface of big data and ethnography (e.g., Ford, 2014; Taylor and Horst, 2013), adding to them a tangible account and a methodological language for what complementarity might entail when working across these data-worlds. In particular, in ways resonant with more speculative accounts of big social data (e.g., Ruppert et al., 2013), we seek to affirm and experiment with the way digital-transactional data create new affordances of ‘extensive processuality’ (Edwards et al., 2013), by allowing us to combine the processual focus of ethnography with the spatially extensive focus of quantitative research in new ways. Indeed, twisting this promise by way of analysing a bounded social occasion (a party) in continuous time via combined digital-ethnographic data, we have sought to show how this enables one to make inferences about collective life at various spatio-temporal scales, from a brief encounter to the party as a whole, allowing dynamic relations and patterns to emerge.
Theoretically, we argue, a party may be described in Goffmanian terms as a social occasion of temporary multi-body co-presence likely to bring about an unusual social intensity or collective ‘effervescence’ (Durkheim [1915]1995) amongst its participants (Goffman, 1963; Wynn, 2016). This poses the question, however, as to whether something as ephemeral and experiential as a social ‘atmosphere’ – a stimmung (Heidegger [1927]1962), in the phenomenological sense – can ever be ‘measured’ through computational analytics? Working as an experimental protocol, our article documents a process of bringing transactional and ethnographic data into close conversation in ways that pushes at the reconstruction of such atmospheres via shifting data stitchings. This process, as we have shown, is inherently collaborative, in that it activates and requires different data-handling skills and approaches (see Marres and Weltevrede, 2013). More generally, we have shown how a search for spatio-temporal granularity represent a promising meeting point between the ethnographic and the digital, as part of a wider establishment of common grounding and analytical complementarity in social data science (Blok and Pedersen, 2014). Here, ‘the granular’ works to both jointly destabilize and allow for iterative mutual support between ethnographic and digital ways of reconstructing the collective flow of a party.
Ultimately, by thus ‘cutting out’ and honing in on a very specific social event within a vast pool of digital, ethnographic as well as other data (the Copenhagen Social Networks Study), we have wished to forge a space for experimental exploration and critical reflection on data practices. To start with, as noted, our move towards the specific and the granular cuts against the grain of many computational social science approaches, which leverage digital-transactional data exactly for the promise of quantitative and presumably objective insight into ‘big and broad’ (Housley et al., 2014) patterns of what is often understood reductively as uniform social ‘behaviour’ (see Marres, 2017). Indeed, we seek to induce a productive space of hesitation with respect to such fast-emerging and near-hegemonic uses of high-resolution digital data in social science research. Beyond ‘big and broad’, we contend, digital social data can also be ‘deep’ and ‘thick’ and granular in ways that extends and augments ethnographic methods and imagination, and deserves exploration in its own right.
In order to stay close to experience, our account is organized, as noted, as an experimental protocol that chronicles our attempt to stitch together radically heterogeneous data formats and worlds. Throughout, we hone in on diverse data practices – of searching, scripting, aggregating, correlating, visualizing, interpreting – that we see as all needed for the development of new modes of description, analysis and critique in this incipient methodological, epistemological and ethical space. Such a spirit of experimentation (Marres, 2013, 2017) is timely, we believe, in the present intellectual and political moment of mushrooming collaborative and cross-disciplinary endeavours and constellations of social data, big and small. Indeed, as a concrete case of social science complementarity (Blok and Pedersen, 2014), we have sought to make space for novel registers of analysis while eschewing too-easy epistemic closures. A key lesson to be heeded from our experiment is the very open-endedness with which transactional and ethnographic data may be made to coagulate into new patterns of sociality, each suggestive of noteworthy activity as well as theoretical description.
But what of critique? What has happened here to the injunction, so prevalent and indeed necessary among anthropologists and sociologists, to forge a critical data science? (cf. Abramson, 2016; boyd and Crawford, 2012). There is in fact a clear sense, we affirm, in which our study works as a critical intervention or, better put, an experiment in different ways of performing critique in, of and with digital data. Hence, by way of iterative descriptions, we have attempted to problematize what may ‘lie behind’ the seeming self-evidence of aggregate numbers (graphs, curves, networks, etc.). Apparently homogenous groups, we have shown, can turn into strange and mutable conglomerates simply by tweaking a few algorithmic codes or changing the parameters of visualization. Using ethnographic observations as a mutual calibration device thus exerts, we argue, its own sobering effects on the ‘bigness’ of data, not unlike what may be achieved via more theoretically informed critiques (Boellstorff et al., 2015).
More generally, our experiment lends support to the emerging realization that, with big social data, you always get ‘too much’ information, with no easily specifiable social context or ground – no easy ‘ground truth’, as our physicist collaborators say. Hence, various data exploration strategies are simultaneously ways of de- and recomposing what a social aggregate may be – both in the data-derived and the socio-ontological sense. Essentially, ours is thus an experiment in the merging and morphing of fluid data aggregates, corresponding to changing perspectives on what constitutes ‘the social whole’ of the party (and beyond).
Which leads us to a final point: at first sight, within the quali-quantitative experimental set-up documented in this article, the old naturalist dream of ‘the fly on the wall’ to some extent has seemed to come true. My-as-ethnographer has been cast as observing her data subjects from a detached distance – not unlike what the rest of us, as neophyte data scientists, did in front of computer screens in the CALL lab, aggregating and visualizing digital data. Here, the role and authority of the anthropologist would seem to be very much akin to a seasoned field-naturalist who, as the only person in the lab, ‘was there’ and who in this capacity could verify the ‘behaviour’ which her colleagues back home can only infer and extrapolate from the database through digitally informed speculation. Yet, despite appearances, the anthropologists’ embodied immersion in the social as it happens is clearly no high-road to a behaviourist all-knowingness. Conversely, as we have seen, the ‘rawest’ of digital data turns out to lead to an interpretative abyss, constantly in need of being filled in by new contextualizations. Even under ‘total observation’, then, the social remains elusive, evolving, and re-composable. The party must go on.
Footnotes
Acknowledgements
The authors gratefully acknowledge the University of Copenhagen (UCPH) centre of excellence grant Social Fabric (socialfabric.ku.dk), headed by economist David Dreyer-Lassen; as well as the Villum Foundation Grant Sensible DTU (sensible.dtu.dk), headed by physicist and computer scientist Sune Lehmann Jørgensen. Together, these two projects make up the Copenhagen Social Networks Study that made possible the experiment reported in this paper. They also form the backbone of the recently inaugurated Copenhagen Centre for Social Data Science (SODAS), to which we (the CALL collective) belong. We gratefully acknowledge the Faculty of Social Sciences at UCPH for sponsoring and facilitating SODAS.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for the research reported in this article was provided by a University of Copenhagen (UCPH) Center of Excellence grant Social Fabric.
