Abstract

The UK benefits from a portfolio of high-quality, large-scale social survey datasets which has accumulated steadily since the mid-20th century, and has expanded considerably in its scope and analytical potential within the last two decades. Large-scale social survey datasets generally have the appealing features of being representative of larger populations, having a large number of participants and providing an extremely broad range of measures that are appropriate for a wide variety of analyses. These important methodological attractions ensure the continued importance of large-scale survey evidence to contemporary sociological research.
The over-arching message that emerges from the papers in this special section is that when undertaking analysis of social survey data, researchers must put considerable thought into the ‘key variables’ that they include in their analyses. Sociologists should first ask themselves ‘what is the variable measuring?’ They should follow by asking ‘does the variable measure what we want it to measure?’ These two questions might seem sufficiently obvious that they need little thought, although in our reviews of measures of occupations, education and ethnicity we have identified a set of complications that should always be treated seriously in sociological research.
In this special section, we have advocated that sociological researchers should ordinarily use established measures that are measured consistently and transparently. We strongly advise against the creation of ad hoc measures. Where new measures are required, we stress the importance of providing clear documentation which describes the protocols and practices involved in creating the measure. Clear documentation is critical because this will aid transparency and better facilitate replication.
We have emphasised the worth of undertaking sensitivity analyses (see also Dale, 2006; Treiman, 2009). This is especially valuable when there are a number of potential measures that could be included in an analysis (e.g. when there are several measures of socioeconomic position deposited with a large-scale survey) (see Lambert and Bihagen, 2014). We advocate that sensitivity analyses be made public. Sensitivity analyses cannot be feasibly published given the standard style of many existing sociological journals, but they should be made available as data supplements and should be accessible on-line, for example, on the researcher’s web pages or through institutional repositories. Although it is relatively uncommon within sociology, the requirement for making the research process (or the data analysis workflow) more transparent is increasingly common in some areas of science, and within neighbouring social science disciplines such as economics.
We concluded this special section with a paper on the application of statistical models in the analysis of large-scale social science datasets. We advocate that sociologists provide clear information about the statistical modelling process. We have recommended that sociologists routinely report statistical models with suitable summary statistics (e.g. measures of goodness of fit and parsimony). We have also showcased a number of approaches that allow clearer interpretations of the results of statistical models. These approaches are especially useful when reporting models such as the logistic regression model which is commonly used in sociological research. In many ways, good practice in statistical modelling represents the culmination of the effective exploitation of large-scale survey data resources. The challenge for social scientists who use statistical models has not generally been to identify and implement suitably complex model procedures (of which there are many alternatives). Instead, we believe that it is the more careful attention to both the construction of variables, and to aspects of documentation and model reporting, that offers the most significant potential for improved standards in the conduct of social research.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
