Abstract
Low levels of public trust in data practices have led to growing calls for changes to data-driven systems, and in the EU, the General Data Protection Regulation provides a legal motivation for such changes. Data management is a vital component of data-driven systems, but what constitutes ‘good’ data management is not straightforward. Academic attention is turning to the question of what ‘good data’ might look like more generally, but public views are absent from these debates. This paper addresses this gap, reporting on a survey of the public on their views of data management approaches, undertaken by the authors and administered in the UK, where departure from the EU makes future data legislation uncertain. The survey found that respondents dislike the current approach in which commercial organizations control their personal data and prefer approaches that give them control over their data, that include oversight from regulatory bodies or that enable them to opt out of data gathering. Variations of data trusts – that is, structures that provide independent stewardship of data – were also preferable to the current approach, but not as widely preferred as control, oversight and opt out options. These features therefore constitute ‘good data management’ for survey respondents. These findings align only in part with principles of good data identified by policy experts and researchers. Our findings nuance understandings of good data as a concept and of good data management as a practice and point to where further research and policy action are needed.
Introduction
Throughout the world, low levels of public trust in data practices have recently been identified
The General Data Protection Regulation (GDPR), which came into effect in 2018, provides a legal motivation to improve data practices in EU countries adopting this legislation. Under GDPR, individuals have rights with regard to access and portability of their personal data. Coupled with concern about the data trust deficit, this new legislation has led to growing experimentation with alternative approaches to the management of personal data, which some believe would be better for people and society (Hall and Pesenti, 2017; O’Hara, 2019). These include personal data stores (PDSs), in which individuals personally store and manage their data, and data trusts, defined by the ODI (2019a) as ‘a legal structure that provides independent stewardship of data’ for the benefit of all parties.
This context has led a range of policy stakeholders to advocate for responsible and ethical data developments. In the UK, where our research took place, advocates include government centres (such as the new Centre for Data Ethics and Innovation (CDEI)), think tanks (for example Doteveryone) and independent research and advocacy organizations (such as the Ada Lovelace Institute (Ada) and the ODI). In academic circles, attention is turning to what good, responsible and ethical data might look like (for example, Daly et al., 2019). Good data management approaches, like PDSs and data trusts, are a vital part of a responsible and ethical data ecosystem, but what constitutes good data management – that is, data storage, stewardship and decision-making about sharing – is not straightforward.
Policy stakeholders in the UK, like the CDEI and Ada, claim that understanding public views about data practices is essential to ensure that data works ‘for people and society’ (Ada’s mission) and is ‘a force for good’ (a CDEI aim). This also applies to data management: in order to determine what constitutes good data management, public views must be taken into account. Research into public views on good data management is therefore needed, so that these can be factored in to future data policy and practice. Yet to date, public attitudes to data management have rarely been examined, and when they have, research has focused narrowly on user feedback on specific models under development or on fictional scenarios (Sailaja et al., 2019). For this reason, our paper focuses specifically on data management, as opposed to other data practices such as data generation, collection, analysis and sharing.
The paper reports on a survey on public views on different approaches to managing personal data (that is, data related to an identified or identifiable person (GDPR, 2018)), which aimed to fill the gap identified above. The survey was administered in the UK in May 2019 to over 2000 adults. Although the GDPR was adopted in UK law after coming into force, the UK’s withdrawal from the EU is causing uncertainty about future data legislation in the UK. As the UK decides what its post-Brexit data laws will look like, understanding public perceptions of good data management in the UK is extremely timely. For this reason, our survey focused on the UK. In the survey, we found that respondents dislike approaches which give commercial organizations control of personal data in return for the digital services they provide. Respondents expressed a preference for approaches that give them control over data about them, that include oversight from regulatory bodies or that enable them to opt out of data gathering altogether. These approaches are not mutually exclusive, but we separated them out for the purpose of our analysis and comment on their relationship in our conclusion. Variations of data trusts (described in detail below) were also preferable to the status quo, but not as widely preferred as approaches involving personal control, regulatory oversight or the ability to opt out. Thus personal control, oversight and the ability to opt out constituted ‘good data management’ for respondents in our survey.
The paper proceeds to situate our research in the context of debates about ‘good data’ and alternative data management approaches. We then describe our methods and discuss our findings. We conclude with reflections on the significance of our findings for conceptualizing good data and for better data management policy and practice.
Good data and alternative approaches to data management
Good data
The emerging field of critical data studies has done a good job of making visible the many troubling consequences of datafication, including increased surveillance, threats to privacy, new forms of algorithmic control, and the expansion of new and old inequalities and forms of discrimination (Iliadis and Russo, 2016; Kennedy, 2018). More recently, and against this critical backdrop, scholars have begun to consider what ‘good data’ alternatives might look like. One example is Daly et al.’s (2019) edited collection, Good Data, which was motivated by a recognition that although scholars had extensively critiqued problematic data practices, they had not considered more positive alternatives. Devitt et al. (2019) describe Good Data as aiming to open up ‘a multifaceted conversation on the kinds of futures we want to see’ and presenting ‘concrete steps on how we can start realizing good data in practice’. They suggest that asking what constitutes good data is an essential step in advancing the critical scholarship which has exposed the harms and injustices that result from widespread ‘bad’ data practices.
Of course, ‘good’ is a complex concept: it could mean fair, ethical or just, or it could have other meanings, some of which acknowledge the power inequalities that shape datafication more readily than others. On the one hand, in the context of datafication, the concept of good has been used by initiatives which might be seen to depoliticize ‘data relations’ (Kennedy, 2016), such as charitable projects like Data For Good and DataKind. On the other hand, it is open to experienced-based interpretation, something that Andrew Sayer (2011) says is important for understanding ‘why things matter to people’.
The term ‘data’ is not straightforward either. Daly et al. (2019) use data as a proxy for the whole DIKW model – that is, the hierarchical pyramid which has data at its base, information above that, knowledge above that and wisdom at the top. In Good Data and elsewhere, including in this paper, data is used as a proxy for the whole data ecology, incorporating structures, data management models, uses and consequences. As such, good data is a metaphor that extends beyond the ‘high quality evidence’ meaning of the term that might be more commonly used amongst data scientists and statisticians.
Good Data’s editors propose a set of principles for good data practices (Devitt et al., 2019), a number of which are relevant to our focus on public perceptions of approaches to data management. Some principles highlight the importance of individual control over what happens to personal data – for example, ‘data subjects must mediate data uses’ and ‘users must be able to understand and control their personal data’. Other principles emphasize collective needs, such as ‘communal data sharing assists community participation’, ‘access to data promotes sustainable communal living’ and ‘open data enables citizen activism and empowerment’. These principles form the foundations for some of the alternative approaches to data management that we discuss below and explored in our survey.
For Daly et al., the motivation to think about good data comes from a belief that data is political and data practices should be evaluated according to whether they are used to enhance social well-being, especially for disadvantaged groups. Good Data thus advocates ‘data methods to dismantle existing power structures through the empowerment of communities and citizens’ (Devitt et al., 2019). We follow Daly et al.’s argument that good data should enhance well-being, especially amongst disadvantaged groups, because it acknowledges the politics of datafication. In order to understand whether particular approaches to data management enhance well-being, we further argue that the views of those impacted by these approaches must be considered, yet neither public perceptions nor data management feature centrally in existing debate about good data. Understanding ‘bottom up’ perceptions of what constitutes good data management is needed, just as it is in relation to other aspects of datafication (Couldry and Powell, 2014). We aimed to fill these gaps with the research we discuss in this paper. Furthermore, as research has shown that inequalities influence perceptions of datafication (Kennedy et al., 2020), the views of diverse populations on what constitutes a good data management model need to be examined.
Approaches to data management
In debates about data management, a number of approaches have been put forward as good alternatives to current arrangements. One is the data trust, ‘a legal structure that provides independent stewardship of data’ for the benefit of all parties (ODI, 2019a; see also Hall and Pesenti, 2017). According to the ODI (2019a), the trustees of a data trust ‘take on responsibility to make decisions about what data to share and with whom’ in order to support the trust’s intended purposes and benefits. One focus of debate has been on the legal status of data trusts, as seen in the definition cited here. In the UK context, a trust is a particular legal structure which does not exist in the same form across all international jurisdictions. However, O’Hara argues that a data trust cannot be a trust in a legal sense. Rather, ‘it takes inspiration from the notion of a legal trust’ (O’Hara, 2019: 4). Our focus in this paper, therefore, is not on the legal dimensions of a data trust, but on its approach to stewardship. A data trust can take many forms, and data management approaches can combine features of a data trust with other features (ODI, 2019a). Table 1 below compares data trusts with other data stewardship models. A further difference relates to the type of data to be managed: some approaches are more appropriate for personal data (such as the PDS), others for open or public interest data (such as the data commons).
Distinguishing features of data stewardship approaches according to ODI (2019a).
Note: The final row of the original table, Research partnerships, has been removed because it is not relevant to our focus here.
Source: reproduced with permission from ODI (2019a).
There are also similarities across different data management approaches. Trusts, co-operatives and commons-based approaches all involve trusted parties overseeing, managing and stewarding data on behalf of individuals and communities. In this sense (rather than in a legal sense), they are all ‘trust-like’. For this reason, in our research, we explored all three of these models: (1) the data co-operative, which manages the collection and storage of its members’ data, is accountable to its members and is governed by a board of representatives constituted by its members; (2) the data commons, similarly collectively motivated, which enables online access to community data which can be used for various purposes and for the benefit of all (see the decode project for an example, https://decodeproject.eu/); and (3) data trusts. We differentiated between two types of trust, building on experimentation that was under way at the time of our survey (ODI, 2019a): (a) a trust governed by an independent responsible party, which makes decisions on behalf of data subjects about who accesses data, what they can do with it and under what circumstances, and (b) a trust governed by multiple independent responsible organizations which manage different types of data in different contexts (for example, one for health data, one for finance data and so on) and represent the interests of all parties involved. We consider these four models as ‘trust-like’ in our discussion below (see Table 2 for a full list of the data management approaches that we explored in our survey).
Data management approaches as described to respondents.
As can be seen in Table 2, we also explored other solutions to the perceived data trust deficit in our survey. One is the PDS, also included in Table 1. The PDS is seen as a more trustworthy approach to managing personal data than current models (for example by Janssen et al., 2019), because it enables individuals to control the processing of, access to and transfer of their personal data. Personal control has been found to be important in UK research about public attitudes to data practices: 94% of participants in a Digital Catapult (2015) survey said they wanted more control over their data. The PDS has therefore received significant attention and financial investment in recent years: notable examples include Solid (https://solid.inrupt.com/) led by Tim Berners-Lee, Databox in the UK (https://www.databoxproject.uk/) and services such as digi.me (https://digi.me/). Advocates such as the international MyData movement believe that PDSs ‘empower individuals by improving their right to self-determination regarding their personal data’ and that with the PDS, ‘the sharing of personal data is based on trust’ (MyData, nd). In contrast, critics argue that the PDS represents an individualized solution. For example, Lehtiniemi and Ruckenstein state that ‘the MyData vision relies on the ethical principle of “human self-determination”, treating the individual as an autonomous subject with inalienable rights and liberties’ (Lehtiniemi and Ruckenstein, 2019: 6; see also Sharon and Lucivero, 2019).
Other more familiar approaches to data management also exist. Under the prevailing ‘notice and consent’ approach (described in our survey as the ‘digital service approach’ or ‘status quo’), the service provider is responsible for managing personal data with users consent. Under GDPR, data controllers are mandated to notify users about the collection of their personal information and associated data practices and obtain agreement in advance. This takes the form of a privacy notice which users must consent to before they can use a service. In some instances, controls may be integrated into privacy notices allowing users to opt in or out of certain data collection practices, but it is often difficult for people to negotiate terms of use, see the extent of data practices or easily change or revoke consent. The shortcomings of the privacy notice system have been well documented (for example by Cate, 2010; Cranor, 2012; Nissenbaum, 2009; Warner and Sloan, 2013). Few people read notices in full (Obar and Oeldorf-Hirsch, 2020) and when they do, they often find them difficult to comprehend. This undermines the premise of informed consent on which the legitimacy of this approach to data management relies (Bakos et al., 2014; Nissenbaum, 2009). This approach has been described as exploitative in light of asymmetries between organizations and end users (Edwards and Veale, 2017; Zuboff, 2018) in which people have little choice but to consent to the data collection practices of digital services, if they want to participate in digital society. Despite these criticisms, this approach to data management is widely adopted across the global digital economy.
Another way to address perceived data management deficits is through regulation. Current EU and UK regulatory frameworks for data have been characterized as contradictory and unclear in a dynamic policy environment (Hinz and Brand, nd). Previous research in the UK has found public support for better regulation of data management, such as a 2014 RSS survey which found ‘more support for the government preventing misuse of personal data than an appetite to have personal control over this’ (RSS, 2014: 3). GDPR has strengthened data protection regulation across EU countries that adopt it, but how the regulation is to be implemented is not entirely clear; L’Hoiry and Norris (2015) have found that data protection regulation does not easily translate from the ‘law in theory’ into the ‘law in practice’ (Galetta et al., 2016). Furthermore, as noted above, although EU laws on data protection apply to the UK during the Brexit transition period, post-Brexit data legislation in the UK is far from clear at the time of writing.
Enabling the possibility of opting in to or opting out of data collection represents another approach to data management. The PDS and variations of the data trust model enable opting in through different means, whereas opting out enables people to enact a desire not to have their data collected (Brunton and Nissenbaum, 2011). It is worth noting that although widespread adoption of either opting out or data trust models is unlikely in the current context of surveillance capitalism dominated by transnational corporations (Zuboff, 2018), these approaches play an important role in debate about future good data arrangements. For this reason, we included them in our survey. Furthermore, as noted above, approaches such as notice and consent, oversight by a regulatory body and opting out are not mutually exclusive. We separated them out in our survey to enable us to evaluate public views of them as components of good data management, and we return to a discussion of their relationship in our conclusion.
Understanding public perceptions of all of the approaches to data management discussed here is important, in order to address the data trust deficit and develop good future data practices. To date, there has been no independent and comparative research on this topic. As an active advocate for data trusts, the ODI carried out three short pilots, concluding that there is ‘huge appetite’ for data trusts within the organizations involved in the pilot (ODI, 2019b). The question of what members of the public, whose data is often at stake in such arrangements, think of alternative data management approaches, including data trusts, remains unanswered. In our research, we asked ‘what do members of the public think constitutes good data management?’ Research cited above suggested that we may find a preference for approaches premised on greater personal control (Digital Catapult, 2015), regulatory oversight (RSS, 2014) or ‘trust-like’ approaches (ODI, 2019b). We put the approaches discussed above to respondents in our survey to elicit their views. In the next sections, we describe our methods and findings.
Respondents’ existing knowledge and views about data practices
In May 2019, 2169 respondents living within the UK completed our online survey. The survey focused on what participants thought about the eight approaches to managing data listed in Table 2. We collected data from diverse respondents from across the UK (for a full demographic breakdown, see Table 3). Respondents were recruited by Qualtrics using opt-in methods, the sample demographics of which compare favourably with other reputable Internet panels such as the British Election Study conducted by YouGov (see column 2 in Table 3). Qualtrics partners with online sample providers to recruit diverse respondents for research purposes. Researchers have found that Qualtrics approximates probability-based samples reasonably well in terms of demographic characteristics and responses to other socio-political questions (Zack et al., 2019). It should be noted that surveys conducted online using an Internet panel like Qualtrics are likely to recruit respondents who are capable technology users. This was confirmed in answers to related questions: 94.6% indicated that they were confident using devices to do things online, 98.9% stated they used the Internet daily and only 8.5% of respondents indicated that they were not users of at least one of the major social media platforms.
Respondent demographics compared to British Election Survey (%).
Note: Our data was collected from members of a self-selected Internet panel by Qualtrics in May 2019. British Election Study (BES) data was collected by YouGov in March 2019. Respondents who provided a ‘don’t know’ answer or refused to answer a question are not included in these totals. Not all percentages sum to 100 due to rounding.
Before rating the approaches, respondents completed knowledge questions to gauge their familiarity with and understanding of concepts relevant to the survey. We presented participants with a series of statements about personal data, open data and the GDPR and asked them to identify whether each statement was true or false. These statements were used to assess their knowledge about relevant issues and evaluate responses to later questions in light of these responses. Some of these statements were reverse worded to account for potential agreement bias. Respondents appeared most knowledgeable about the concept of personal data, with the vast majority correctly answering questions related to its definition: more than 7 out of 10 respondents answered these questions correctly. Respondents were least knowledgeable about open data: less than half were able to correctly answer two questions on this topic. Results were mixed concerning familiarity with and understanding of GDPR: 93% of the sample correctly answered a question about its main purpose and 53% provided correct answers to a question about data portability (see Table 4).
Percentage of knowledge questions answered correctly.
Once completed, we provided respondents with the answers to these questions to ensure that everyone began subsequent sections with the same general information about the topic. We also included questions on attitudes towards how personal data is collected, stored, used and shared by organizations, to gauge respondents’ views on a broad range of related issues and enable us to analyse whether attitudes were indicators of preference. We asked participants to indicate on a five-point Likert scale whether they agreed or disagreed with a series of statements. Respondents were concerned about the privacy (84.6% agreement) and security (84.2%) of their personal data. They wanted to be able to exercise their rights (92.1%) and have more control over their data (89.0%). In particular, they were concerned about how their personal data is used by organizations (86.9%), and they wanted companies to be held accountable if it is misused (96.1%). Respondents were against commercial organizations using personal data to generate profit (78.3%). Only around half of the respondents supported sharing personal data for use in research in the public interest (52.7%). Around two in three wanted data to be used for the social good (68.8%). Most want data to be managed, analysed and gathered in ethical ways (84.0%). A full list of statements and responses can be seen in the Supplemental Appendix.
In another part of the survey, we asked participants about the types of data-driven apps and services that they would like to see developed in the future, inviting them to select services from a list or add their own. Types included related to health, well-being, the environment and education. When we asked respondents who they would like to see provide these services, most said they preferred governmental or publicly-funded organizations – 46% and 40% of respondents selected these options, compared to 18% selecting commercial organizations in a question where respondents could select as many options as they wished.
The questions discussed thus far were asked to aid our analysis. Existing research has highlighted that knowledge levels influence public views about data practices (Digital Catapult, 2015; Doteveryone, 2018) and as such establishing existing knowledge levels was necessary. Standard demographic questions were asked to enable us to explore whether different groups of people have different views about good data management. Responses to questions about future data-driven apps and services indicate what might constitute good data management for respondents: personal control; the ability to exercise one’s rights; accountable, pro-social uses of data; and oversight by a public body.
Views on approaches to data management
Examining respondents’ views about data management approaches was at the heart of our survey, and we used three different methods to do this. Our first method asked respondents to rate four randomly selected data management approaches (presented one at a time) using a Likert scale ranging from 0 (poor) to 10 (excellent). This method is commonly used in surveys, yet assigning a numeric value on an 11-point scale can be difficult for some respondents. To address this issue, our second method of assessing preferences used an innovative approach called a conjoint experiment (Hainmueller et al., 2014). A conjoint experiment works by presenting respondents with options randomly generated from a list. The task involves comparing items side-by-side and then choosing the preferred option. This forced choice design simplifies the decision facing respondents (Hainmueller and Hopkins, 2015; Pelzer, 2019). We used a single-attribute conjoint experiment in which participants were presented with two randomly selected approaches from the list of eight (see Table 2 for the exact wording of each model) and asked them to select the approach that they preferred from the pair. This paired selection task was repeated three times for each respondent. Table 5 provides an example of the single-attribute conjoint experiment used in this study, which allowed us to evaluate how respondents rated the approaches in comparison to one another.
Example of the single-attribute conjoint experiment.
Our third and related method for assessing respondents’ views of the data management approaches was to ask them to complete a multiple-attribute conjoint experiment. This differed from the second method we described above in allowing us to compare different factors that may affect the decision to select one data management approach over another. We accomplished this by randomly combining multiple factors into data management profiles to assess the relative effect of each specific factor on preferences. We asked respondents to express preferences for scenarios generated from a combination of factors identified as significant in previous research (for example Kennedy et al., 2015):
Type of data (for example, medical, financial, media consumption); What management arrangements mean for the individual (for example, full control over what happens to data, knowing what data is held about them, by whom and what they do with it); Use and beneficiaries of the data (for example, personal insights, generate profit, benefit society).
In addition, we included who has control (for example, individual, trustee, commercial organization) as a factor, as this is relevant to our focus on data management. An example of our multiple-attribute conjoint experiment is provided in Table 6 (the full survey and stimulus materials are available in the Supplemental Appendix).
Example of the multiple-attribute conjoint experiment.
Preferences in relation to approaches
Of the eight approaches to data management that we presented to respondents, three were consistently rated highly. The most preferred approach was the PDS, described in the survey as ‘a secure place to collect, store and manage the data about you which has been collected by other services’ which would give individuals control over their personal data (see Table 7 for mean ratings of each model). Responses to questions about views on data uses suggest that the possibility of greater individual control may be why this approach was highly rated: 86.9% of respondents agreed with the statement ‘I want more control over how my personal data is used by organizations’, and 89.0% agreed with the statement ‘I want more control over my personal data’. As noted above, previous research by Digital Catapult (2015) also highlighted the importance of personal control.
Mean ratings on a scale from 0 to 10 for each data management model.
After the PDS, the next highest rated approach involved a regulatory public body overseeing ‘how organizations access and use data, acting on behalf of UK citizens’ in order to ‘ensure that personal data are collected, stored and used in legal and fair ways’. As noted above, elsewhere in the survey, we asked respondents who they would like to see provide new data-driven services ‘for the public good’ and most selected governmental organizations (46% of respondents), followed by publicly-funded organizations (40%). This reinforces the finding that oversight of data by a public regulatory body was a strong preference for our respondents.
The high rating of this model by respondents suggests a preference for legally enforceable safeguards alongside the personal control of data offered by the PDS. This finding was confirmed in responses to questions about views on data uses, in which 96.1% of respondents agreed with the statement ‘I want companies to be held accountable if they misuse my personal data’. Realizing this statement requires governance, which may explain respondents’ strong preference for data management to be overseen by a regulatory body. In contrast to the RSS (2014) survey cited above, which found more support for governance than personal control, we found a strong preference for both. The high ranking of both the PDS and oversight by a regulatory public body suggests that both personal control and oversight are important principles of good data management for respondents.
We described the approach that would allow people to opt out of having their data collected as a ‘Data ID Card’, to give material form to a means for opting out of data collection. This approach was ranked third overall. The relatively high ranking of this model reinforces the importance of individual control over data amongst our respondents. It also shows that respondents would be willing to opt out of data gathering, indicating strong dissatisfaction with current data arrangements.
We explored respondents’ views on data management in multiple ways in the survey, to ensure reliability of findings. We found that the results of the single-attribute conjoint experiment corroborated the findings discussed above. This experiment asked respondents to choose the option that they preferred from a randomly generated pair of approaches, the results of which are presented in Figure 1. The plotted points provide the change in the probability of selecting an approach relative to the status quo (that is, digital services having control over people’s data). The vertical dotted line indicates the digital service/status quo baseline; points to the right of the dotted line indicate an increase in the probability of choosing that particular approach relative to the baseline. The lines around each side of plotted points are 95% error bars, indicating uncertainty around each value, which derives from the fact that our survey is based on a sample of the population.

Results from the single-attribute conjoint analysis.
As with the individual ratings task, this experiment revealed that the top three preferred approaches are the PDS, opting out and oversight by a regulatory public body, in that order of preference. There was at least a 30% point increase in selecting any of the top three data management approaches compared to the status quo/‘notice and consent’ approach. This is a significant number, both statistically and substantively. The approaches that did not offer personal control or regulatory oversight, which we describe above as ‘trust-like’, had lower mean scores than those that did offer such features, in both the rating task and the single-attribute conjoint experiment. These include approaches overseen by a public data commons, a data co-operative, multiple responsible independent organizations or a specific responsible independent party. Trust-like approaches were preferable to the status quo, but less preferable than those based on the concepts of personal choice, control and regulation. These approaches may have received lower ratings because they were less familiar to respondents than approaches based on the more commonplace concepts of choice, control and regulation. As noted above, in the knowledge questions with which we opened the survey, respondents demonstrated limited knowledge of open data, the principles of which influence data trust approaches. In addition, elsewhere in the survey, only 39.3% of respondents agreed with the statement ‘I’m in favour of open data’. This relatively low level of support for open data could result from the low levels of understanding of open data that we also identified. Together, these findings may explain the lower mean scores for the ‘trust-like’ data management approaches that we presented to respondents.
It is striking that respondents preferred all other approaches to a ‘digital services model’ that ‘gives services control over what happens to your data’. With an average rating of just 4.9 out of 10, this suggests that respondents are unsatisfied with services and organizations controlling data. Combined with the high rating of the opt out model, and strong support for statements expressing concern about data management issues, these findings show that current arrangements require radical change in order to win public support.
Preferences in relation to data handling scenarios
We also used a multiple-attribute conjoint experiment, which compared the significance of a number of factors in data handling scenarios – including types of data, uses of data and related benefits (identified as significant in previous research (Kennedy et al., 2015)) and control arrangements and what these enable – to assess preferences towards data management approaches. Figure 2 displays the results from this conjoint experiment. As with the single-attribute conjoint analysis, in the figure, we present results which show the change in the probability of selecting a profile with particular characteristics relative to a baseline, this time for each of the attributes we included in the scenarios.

Results from the multiple-attribute conjoint analysis.
Figure 2 demonstrates that the most important factor influencing responses to the multiple-attribute conjoint experiment was the locus of control over data – respondents want control to rest with them. The probability of respondents selecting a data management scenario that gives them control over their own data increased by 30% points relative to the baseline (that is, a commercial organization controls the data). Thus, personal control played a key role in this experiment, just as it did in evaluations of data management approaches (as seen in Table 7) and in responses to statements about data use and management. As we discovered throughout the survey, respondents preferred scenarios in which anyone other than a commercial organization was responsible for controlling their data. In this experiment, there was little notable differentiation among the alternative controllers that we presented, apart from respondents themselves, for whom a significant preference was expressed.
The other significant factor in this experiment related to uses of data and beneficiaries. Respondents preferred scenarios in which data would be used for insights or to benefit society rather than for profit, which is consistent with findings from other surveys (e.g. Doteveryone, 2018). The effect sizes for these factors were in the medium range, with a change in the probability of selecting that profile of 0.15 or greater. In other words, there is a 15-percentage point increase in the chance that a particular profile would be selected when it provided personal insights or benefits to society compared to profit. Other factors were not as impactful, as Figure 2 demonstrates. For instance, respondents did not significantly differentiate in relation to what management arrangements mean for the individual (for example, giving them control over what happens to data or enabling them to know what data is held about them), as seen in Figure 2(c). Finally, this experiment confirmed the finding from elsewhere in the survey that respondents do not like their personal data to be controlled by commercial organizations (Figure 2(b)) or used for profit (Figure 2(d)). As Figure 2 shows, all other scenarios were preferable to this one.
Differences amongst respondents
Recent research has demonstrated that people experience datafication differently. Ethnicity, gender, poverty and their intersections have been shown to impact people’s experiences of data practices (Eubanks, 2017; Noble, 2018). There is much less research into whether social inequalities influence perceptions of data practices (Kennedy et al., 2020 is one exception). Because of this, we analysed whether these and other characteristics, including existing knowledge of data-related matters, had an impact on respondents’ views of data management approaches. This latter variable, knowledge, was indeed a significant predictor of preferences in relation to some of the approaches (see the Supplemental Appendix for full results). In the ratings exercise, for example, knowledgeable respondents preferred approaches that offered more control and/or oversight over personal data by a regulatory public body than less knowledgeable respondents, who indicated a slightly higher preference for the status quo, which gives digital services control over their data. This effect was relatively small (about a half point difference on a 10-point scale). Age also had a significant impact on ratings of approaches: younger respondents rated the status quo model higher than those who were aged 35 years and over (about 1 point higher mean rating on a 10-point scale). Thus, differences relating to age and existing knowledge mattered, but not a great deal. Apart from these two findings, there were no other clear differences in evaluations by demographic subgroups within the sample. In other words, we did not find that gender, ethnicity, educational attainment, employment status or household income were significant predictors of preferences.
Similar subgroup differences were observed in the single-attribute conjoint experiment, presented in Figure 3 (the full set of comparisons is available in the Supplemental Appendix). This figure plots the average proportion of respondents selecting each data management model, also known as marginal means, by age and knowledge. By design, marginal means average 0.5. In other words, if responses were simply randomly chosen, there is a 50:50 chance that a given response is selected. Values above 0.5 tell us that respondents prefer a given approach, and values below 0.5 indicate that respondents do not like the approach. A value of 0 would tell us that the approach was never selected; a value of 1 means that it was always selected. As with previous figures, Figure 3 also includes error bars.

Subgroup responses to the single-attribute conjoint experiment by age group and existing knowledge.
While the plot points for various demographic subgroups were for the most part grouped closely together, indicating consistency in responses, there are some exceptions. One is age, which appears to have some influence on preference. Respondents in the 18–34 years age group were less swayed by the PDS, oversight by a regulatory public body and the opt out option than respondents aged 35 years and over, although younger respondents still preferred these approaches to the others presented to them. This is indicated in Figure 3 by the closer proximity to the 0.5 value for younger respondents. Less knowledgeable respondents, in general, were also less likely to differentiate among the approaches. Again, this is shown in the closer proximity of their marginal means to the 0.5 vertical line. The effects of both of these variables, however, are relatively small, as we observed with responses to other survey items.
Discussion and conclusions
Our research asked ‘what do members of the UK public think constitutes good data management?’ Our findings suggest that personal data, oversight from regulatory bodies and the choice to opt out of data gathering are the main components of good data management from the perspective of the UK public. Another important finding is that respondents dislike approaches in which commercial organizations control and profit from personal data in exchange for digital services. As noted above, these approaches to data management are not mutually exclusive. Under GDPR, the dominant ‘notice and consent’ model should include opt out options and oversight from regulatory bodies. In this context, we draw three conclusions from our findings.
First, our research suggests that organizations which handle personal data and policy-makers in this domain need to accept that current arrangements are not acceptable. People like the idea of choice, control and oversight, and they do not like commercial organizations controlling and profiting from their personal data. Second, given that some of preferred features are provided for under GDPR, which continues to be implemented in the UK at the time of writing, our findings raise questions for future research about the relationship between the ‘law in theory’ and the ‘law in practice’ (Galetta et al., 2016). These include questions about whether people perceive the existing arrangements as ‘good’ but in need of better enforcement, or whether greater oversight by regulators and more stringent regulations would be preferred.
Third, we need to think carefully about what respondents’ preference for more control over their personal data might look like in practice. In previous qualitative research that we have undertaken, participants expressed concern about the burden of decision-making that a PDS approach might impose upon them as individuals (Steedman et al., 2020). Offloading the responsibility for good and informed data management decision-making onto citizens may therefore be problematic. Effective approaches to greater personal control need further research. Our research has identified what users want; further research into how to realise this in practice is needed.
A further finding from our survey is that not all alternatives to data management are rated equally by respondents. Although they preferred all alternatives to the status quo, they expressed a greater preference for some than for others. Data trust-like approaches – a public data commons, a data co-operative, oversight by a responsible independent party or organizations – were ranked below PDS, regulatory and opt out approaches. These findings were consistent across different methods used in the survey. We cannot therefore conclude that there is a ‘huge appetite’ for data trusts amongst the public, as the ODI suggests exists amongst organizational stakeholders, based on their pilot (ODI, 2019b). Further research is needed to explore the reasons for this, although some speculation is possible. Data trust-like approaches may have been rated lower than other approaches because they were less familiar to respondents than approaches based on the more commonplace concepts of control, opting out and regulation. Respondents’ limited knowledge of and support for open data, the principles of which inform data trusts, was evidenced in answers to diverse questions in the survey. This might explain respondents’ lesser preference for these approaches.
Existing knowledge and age had an impact on evaluations of approaches, but the effects of these factors were relatively small. The fact that less knowledgeable respondents were less likely to differentiate amongst approaches might suggest that with good information, more differentiation of approaches might result. But the relationship between information, understanding and perceptions of data practices is complex, and previous research has shown that information and understanding are not necessarily the solution to the data trust deficit (Steedman et al., 2020). Here again, further research is needed to understand the relationship between knowledge about and preference for data management approaches in greater depth.
Our research indicates that public views of good data management align only in part with the principles of good data identified by experts and commentators. Devitt et al.’s (2019) principles ‘users must be able to understand and control their personal data’ and ‘data subjects must mediate data uses’ were confirmed by our respondents strong preference for a PDS model or an opt out option to give them control over what happens to their data. However, collective principles such as ‘communal data sharing assists community participation’, ‘access to data promotes sustainable communal living’, and ‘open data enables citizen activism and empowerment’, represented in data co-operative and public data commons approaches, were not as widely preferred, although respondents did indicate support for pro-social uses of data. Respondents’ evaluations of what constitutes good data management did not align with those experts who argue that data trusts represent a model of good data either, given that the trust-like approaches that we presented to them were not the most preferred options. A major contribution of our research, then, is that it nuances understandings of good data as a concept and of good data management as a practice.
In some ways, the UK is in a unique position when it comes to data management futures, given current uncertainty about post-Brexit data regulation. This situation provides the UK government with an opportunity to heed what the public wants, which has been the main focus of our paper. We found a ‘huge appetite’ for alternatives to commercial control of personal data amongst our respondents, and a clear indication of what constitutes good data management for them. The UK government could choose to implement good data management approaches which have public support, but this would require investment of resources for technical development and for further public consultation. By contrast, disregard for public views about what constitutes good data management would perpetuate distrust, and this would likely have consequences both for government and for organizations that are trying to work with data in ways that are good, ethical and responsible. In many ways, these conclusions are not unique to the UK. Many countries face similar challenges relating to trust, and research on attitudes to data practices in general has found similar levels of concern across countries (for example Edelman, 2018; European Commission, 2019; ODI, 2018; PEGA, 2019). Further research is needed across the globe to explore why particular data management preferences exist, and global action is also needed, from data policy-makers and practitioners, to respond to public concerns.
Supplemental Material
sj-pdf-1-bds-10.1177_2053951720935616 - Supplemental material for Public perceptions of good data management: Findings from a UK-based survey
Supplemental material, sj-pdf-1-bds-10.1177_2053951720935616 for Public perceptions of good data management: Findings from a UK-based survey by Todd Hartman, Helen Kennedy, Robin Steedman and Rhianne Jones in Big Data & Society
Supplemental Material
sj-pdf-2-bds-10.1177_2053951720935616 - Supplemental material for Public perceptions of good data management: Findings from a UK-based survey
Supplemental material, sj-pdf-2-bds-10.1177_2053951720935616 for Public perceptions of good data management: Findings from a UK-based survey by Todd Hartman, Helen Kennedy, Robin Steedman and Rhianne Jones in Big Data & Society
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a grant from the Arts and Humanities Research Council, award number AH/S012109/1, and BBC Research and Development.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
