Abstract
Bold approaches to data collection and large-scale quantitative advances have long been a preoccupation for social science researchers. In this commentary we further debate over the use of large-scale survey data and official statistics with ‘Big Data’ methodologists, and emphasise the ability of these resources to incorporate the essential social and cultural heredity that is intrinsic to the human sciences. In doing so, we introduce a series of new data-sets that integrate approximately 30 years of survey data on victimisation, fear of crime and disorder and social attitudes with indicators of socio-economic conditions and policy outcomes in Britain. The data-sets that we outline below do not conform to typical conceptions of ‘Big Data’. But, we would contend, they are ‘big’ in terms of the volume, variety and complexity of data which has been collated (and to which additional data can be linked) and ‘big’ also in that they allow us to explore key questions pertaining to how social and economic policy change at the national level alters the attitudes and experiences of citizens. Importantly, they are also ‘small’ in the sense that the task of rendering the data usable, linking it and decoding it, required both manual processing and tacit knowledge of the context of the data and intentions of its creators.
Big questions
The shift towards the use of ‘Big Data’ made a seemingly ‘explosive’ entrance into the social sciences in 2011 (Burrows and Savage, 2014: 1). While there is no definition of the term, it is typically used to denote data from online sources (i.e. web usage), public records (e.g. geocoded reports of crime incidents, Ordnance Survey data) or transactional data (i.e. phone calls to public services, financial expenditure or insurance claims) from commercial enterprises that is continuously updated in vast quantities (Manovich, 2011; Savage and Burrows, 2007). Beyond the epic contents of ‘Big Data’, it has brought with it fundamental questions about the nature of social science data, the quality of the knowledge generated from it and the epistemologies that underscore traditional scholarly enterprises. Such is the breadth and depth of ‘Big Data’ that Housley et al. (2014) argued that it made for “uncomfortable” (p. 2) comparisons with the ‘bread and butter’ of more traditional data sources, such as episodically generated data-sets (c.f. Mayer-Schönberger and Cukier, 2013; Savage and Burrows, 2007).
While there is little doubt that the features of ‘Big Data’ compel social scientists to redefine the nature of social knowledge and the validity of our research methods (Savage and Burrows, 2007), national surveys and official statistics remain crucial to our enterprise. Conducting research on long-term attitudinal trends or patterns of crime for example, by definition, involves the close inspection of historical processes, of which the most reliable data is habitually derived from national surveys and official indicators – and for which ‘Big Data’ cannot be created, either due to the impossibility of retrospectively imputing measures of social attitudes or because the manual extraction of data from paper records is either too costly and time-consuming or where missing data may not be random. Furthermore, many large-scale national surveys, such as the Crime Survey for England and Wales (CSEW), 1 the British Social Attitudes Survey (BSA), the British Election Study and the Labour Force Survey, continue to be updated on a regular basis. This means it is possible to use this data to understand dynamic interrelationships and to observe and model both rates of change and lagged processes over time (Pawson and Tilley, 1997). In recent years computational technology has broadened the scope of statistical techniques available to us (c.f. Mayer-Schönberger and Cukier, 2013). It is now possible to combine high volume 2 data-sets from a variety of sources, explore dynamic social processes through advanced quantitative methods and organise the data in such a way as to observe shifts at individual and aggregate levels. By collating data over large periods of time, it also allows for robust analyses of particular items where responses or subgroups may be rare, for example, male victims of domestic or sexual violence (Gadd et al., 2002), or to dissect three types of time-related effects such as age, period and cohort analysis (Ryder, 1965). 3
In sum, repeated cross-sectional surveys afford researchers distinctive opportunities to assess long-term temporal processes to address complex research questions. Attention to historical resources has been underlined by Rock (2005), who has stressed that criminological researchers – as well as other social scientists – should be aware of a manifest ‘chronocentrism’ that frequently “neglect[s] what is old” (p. 20), overlooks the accumulation of data and works against the collective structure of knowledge. Similarly, scholars in sociology and politics have argued that crucial social phenomena are best explained in terms of the temporal study of ‘path dependence’, that is to say how particular courses of action and development are alighted upon and become reinforced over time (David, 2011; Pierson, 2000). 4
The long view: Capturing the legacy of Thatcherite social and economic policy on crime
As a research team, we were confronted with the methodological and theoretical considerations of ‘big’ data-sets after embarking on a project to understand the long-term impact of Thatcherite public policies from the 1980s to the present day. Our initial analysis had demonstrated, in line with a substantial field of research on the link between the economy and crime rates (e.g. Cantor and Land, 1985), that as levels of unemployment and economic inequality rose, property crime rose (Jennings et al., 2012; c.f. Morgan, 2014). As property crime increased, SO too did fear of crime and government attention to the issue of crime (see Farrall and Hay, 2010; Farrall and Jennings, 2012; Hay and Farrall, 2011). However, we wanted to further explore the differences across different demographics, such as by gender, housing tenure and geography, and to model attitudinal shifts in relation to other types of crime (such as violence). Notably, scholars from related branches of social policy have also begun to conduct allied longitudinal investigations in housing policy (Dorling, 2014), opiate drug-use (Morgan, 2014), education policy (Berridge et al., 2001) and social attitudes (Duffy et al., 2013; Nacten, 2014), highlighting the need for us to build an integrated model of analysis.
Small big data: The construction of a multi-layered data-set
Summary of individual-level data.
Individual-level data
Victimisation
Officially recorded crime statistics have long been held in suspicion by many criminologists (Maguire, 2007). Our data incorporates self-reported data on victimisation from the CSEW. 6 This records respondents’ experiences, within the preceding 12 months, of most forms of crime. 7 The CSEW also includes a series of questions on fear of crime, perceptions of anti-social behaviour in the local area, confidence in the police and attitudes towards punishment and the criminal justice system. The merged CSEW data-set that we have developed, combining 21 sweeps of the survey that ran between 1981 and 2013, consists of 599,517 respondents and over 150 survey items that have been asked in multiple surveys.
Social attitudes
Our data on public attitudes towards crime and criminal justice, and many other domains of social and economic life, is taken from two main sources. First, we have drawn on the 28 waves of the BSA, 8 which provide measures of social attitudes towards sentencing, punitiveness and matters relating to welfare. Second, the British Election Study’s ‘Continuous Monitoring Survey’ (BES-CMS) that ran on a monthly basis between 2004 and 2013 includes a range of measures of socio-political attitudes, such as satisfaction with the criminal justice system, evaluations of government/party handling of crime and emotions about crime.
Aggregate-level data
Summary of aggregate data.
Criminal justice system
For comparison against the CSEW victimisation-data, and also for enabling a longer-term view of crime, our data includes official recorded statistics on crimes for England and Wales. Annual data on the size of the prison and probation population is taken from Home Office Probation and Prison Statistics England and Wales.
Socio-economic indicators
Data is also included on levels of inequality, poverty and incomes from the Institute for Fiscal Studies (www.ifs.org.uk). Standard measures of inflation and unemployment rates, the claimant count and rate, economic inactivity, average earnings, labour disputes and GDP are drawn from official statistics of the Office for National Statistics (www.ons.gov.uk). Data on annual benefits expenditure (and specific categories of benefits) is taken from the Department for Work and Pensions (2014). We have also collated data on truancy from the Youth Cohort Study from 1985 and school expulsions from the late 1990s. To complete our measures of social conditions we have data on the number of children in care dating back to the 1960s.
Policy and politics
Our data-set also includes measures of political attention to policy action on crime. We draw on data from the UK Policy Agendas Project (www.policyagendas.org.uk) to capture the amount of attention given to crime, and law and order, in the statement of policy intentions set out in the Queen’s Speech and in Acts of Parliament (between 1945 and 2012).
Public opinion
Finally, we have collated a number of aggregate-level measures of public opinion over an extended time period, enabling a long-term view of attitudinal shifts. This includes survey data on the “most important problem” facing the country, as collected by the Gallup Organization between 1944 and 2001 (see Jennings and Wlezien, 2011). In addition, we include data on the public’s preferences for left-wing or right-wing public policy (‘public policy mood’), from Bartle et al. (2011), and have constructed a measure of public punitiveness using survey items on capital punishment, sentencing and other aspects of criminal justice, using a method developed by Stimson (1991) and applied by Enns (2014) in the US.
Our enterprise raises questions about the degree to which longitudinal shifts in social behaviours and public attitudes are accurately captured by newer forms of Big Data. Our view is that Big Data (i.e. transactional data, administrative records or web data) cannot effectively capture behavioural or attitudinal patterns that occurred before the move of much social, economic and political economic activity online (post 20th century), potentially limiting us to ‘chronocentric’ data. Moreover, it is likely that what data is available in this format will – at this point in time – often be disparate and unprocessed, and require considerable effort to peg new automatically-collected measures against traditional survey-based instruments. Measures of criminal activity or public fear of crime, for example, would have to link and calibrate existing survey data to untried indicators and assess their face validity. Retrospective construction of measures over time is a substantially more complex task than the compiling of repeated cross-sectional survey data over an extended period of time. As such, we see the current research agenda promoting the usefulness of Big Data as welcome when it is used alongside (rather than as an alternative to) rigorously designed, sampled and collected survey data. In this way we do not, at least for the foreseeable future, imagine that Big Data will replace social survey data (which has the added advantage of extending back in time to the 1970s and beyond, enabling long-term trends to be observed in a consistent way). The next steps for those interested in advancing the cause of Big Data may include, therefore, figuring out how Big Data and existing social survey data may be integrated in order to combine the advantages of both.
An adaptable resource
It is important to acknowledge the limitations to what we are able to do. ‘Big Data’, no matter how sizeable or how well-sharpened, is no magic bullet, even if it was integrated with social survey data. There are issues which we are interested in (such as the experiences of homeless people in the 1980s) and for which no data set exists. In sum, the sorts of experiences and attitudes which we are able to analyse with historic data reflect the sorts of preoccupations of an earlier generation of researchers. This is a perennial problem for those conducting secondary data analyses (Dale, 2004). Nevertheless, we have employed traditional “small” data and amalgamated them into what we believe is now a vast, broad and dynamic group of data-sets, with the potential to answer significant ‘big questions’ about the effects of specific social and political policies on behaviour and public sentiments over time. It is significant that the processes involved in rendering the data usable were “small”, in terms of the manual extraction of data and the specific knowledge required for handling survey data where there exists no clean digital footprint of variable names or contents (i.e. electronic versions of data might be unlabelled or coded in different ways across time that would lead to errors in automatic processing, without closer inspection of the original documentation). Despite such data-sets being “big” in the sheer scale of data points (with close to three quarters of a million respondents to surveys included in our data-sets), their merging and standardisation relied upon traditional methods of manual processing to create a resource for large scale data analysis.
These data-sets have been constructed to be used by other researchers. Our project is funded by the UK’s Economic and Social Research Council (award number ES/K006398/1, for more information on the project see http://www.sheffield.ac.uk/law/research/projects/crimetrajectories), meaning all of the data which we have collated will be deposited at the UK Data Archive at the end of the project (Autumn 2015). New users may utilise or adapt the data as they see fit. For example, others can update the data-set as new sweeps of surveys are released to the public, as well as customising it to answer questions substantially different to our own. In this sense we hope our data could become a ‘platform’ for others to build upon, using for their own research projects, PhD studentships and teaching purposes.
Footnotes
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This study was supported by the ESRC in the form of research grant ES/K006398/1.
