Building data management capabilities to address data protection regulations: Learnings from EU-GDPR

Abstract

The European Union’s General Data Protection Regulation (EU-GDPR) has initiated a paradigm shift in data protection toward greater choice and sovereignty for individuals and more accountability for organizations. Its strict rules have inspired data protection regulations in other parts of the world. However, many organizations are facing difficulty complying with the EU-GDPR: these new types of data protection regulations cannot be addressed by an adaptation of contractual frameworks, but require a fundamental reconceptualization of how companies store and process personal data on an enterprise-wide level. In this paper, we introduce the resource-based view as a theoretical lens to explain the lengthy trajectories towards compliance and argue that these regulations require companies to build dedicated, enterprise-wide data management capabilities. Following a design science research approach, we propose a theoretically and empirically grounded capability model for the EU-GDPR that integrates the interpretation of legal texts, findings from EU-GDPR-related publications, and practical insights from focus groups with experts from 22 companies and four EU-GDPR projects. Our study advances interdisciplinary research at the intersection between IS and law: First, the proposed capability model adds to the regulatory compliance management literature by connecting abstract compliance requirements to three groups of capabilities and the resources required for their implementation, and second, it provides an enterprise-wide perspective that integrates and extends the fragmented body of research on EU-GDPR. Practitioners may use the capability model to assess their current status and set up systematic approaches toward compliance with an increasing number of data protection regulations.

Keywords

EU-GDPR data protection regulations compliance resource-based view capabilities

Introduction

In 2020, the European Commission (EC) released a plan for a European data strategy. This plan outlines the transformative importance of data in modern economies and strives to position the European Union (EU) at the forefront of data-related innovation (European Commission, 2020). One of the pillars of this strategy is the building of public trust in data processing activities, with the General Data Protection Regulation (EU-GDPR) enabling individuals to have control over their personal data. EC President Ursula von der Leyen even expressed the hope that the EU-GDPR would set data protection “standards for the rest of the world” (von der Leyen, 2020). Since it went into effect in May 2018, the EU-GDPR has initiated a paradigm shift in data protection toward greater choice and sovereignty for individuals and more accountability for organizations (De Hert and Papakonstantinou, 2012). The strict European rules have inspired regulations in other parts of the world, for instance, the California Consumer Privacy Act (CCPA, California State Senate, 2018), India’s new Personal Data Protection Bill (Parliament of the Republic of India, 2018, Govindarajan et al., 2019), and Japan’s update to the Act on the Protection of Personal Information (Japan Personal Information Protection Commission, 2020, Tanaka and Kitayama, 2020).

For organizations, these regulations come with a mandate to clarify whether, how, and how well they protect personal data, along with increased fines for non-compliance. As they apply at a scale larger than any previous data protection regulations, they require a fundamental reconceptualization of how they store and process personal data on an enterprise-wide level. In the case of EU-GDPR, many organizations are still facing significant compliance challenges and struggle with the conflicting interests between legal obligations, business drivers, and innovation (Jakobi et al., 2020). A study conducted in June 2019 among more than 1000 European and US companies reported that organizations had been overoptimistic about their ability to achieve timely compliance. When surveyed in March and April 2018, 78% of responding organizations expected to comply with the EU-GDPR by the time it came into force, but only 28% of them reported being compliant when evaluated a year later (Capgemini Research Institute, 2019). A study released in April 2020 reveals similar difficulties among multinational enterprises: Only 54% of them had achieved operational compliance, while 37% were still conducting “significant readiness actions,” and 9% were still in “project mode” (Dansac Le Clerc and Mannent, 2020). According to this study, a majority of organizations are still implementing mechanisms to manage data protection rights, data storage and retention, and in-depth registries of data processing activities (Dansac Le Clerc and Mannent, 2020).

From a research perspective, the EU-GDPR has been debated in both legal and IS communities (De Hert and Papakonstantinou, 2012, 2016; Jakobi et al., 2020; Mitrou, 2017). Although legal aspects of information privacy had not been among the “topic areas closer to the interests of most IS researchers” (Bélanger and Crossler, 2011), the EU-GDPR has recently attracted more academic interest. Like the idea of regulatory technologies for financial regulations, so-called "RegTech" (Butler and O’Brien, 2019), IS researchers have mostly focused on technical solutions that ease EU-GDPR compliance, such as blockchain (Farshid et al., 2019; Guggenmos et al., 2020; Mejtoft et al., 2019; Rieger et al., 2019) or enterprise architecture (Burmeister et al., 2019, 2020; Huth, Burmeister, et al., 2020). In doing so, the existing body of IS research on EU-GDPR remains fragmented and proposes solutions for isolated aspects of the regulation. It fails to provide an enterprise-wide perspective on the compliance requirements and a broader discussion of the alternative ways to address them.

The difficulties in achieving EU-GDPR compliance highlight the general lack of common ground not only between legal and IS research communities but also between professionals in both disciplines. In most companies, data protection topics have traditionally been addressed by legal and compliance departments, which adapt contracts and general conditions. The new generation of data protection regulations does not allow for such a restricted approach, but requires a fundamental reconceptualization of how companies store and process personal data on an enterprise-wide level (Labadie and Legner, 2020). Since the EU-GDPR does not prescribe concrete implementation options, companies find it challenging to interpret the regulation and develop suitable data management practices that support compliance. Thus, data processing–related issues remain the most challenging topics in the EU-GDPR (De Hert and Malgieri, 2018; Nicolaidou and Georgiades, 2017; Thélisson, 2020).

In this paper, we introduce the resource-based view (RBV) as a theoretical lens that helps explaining the complex and lengthy trajectories towards EU-GDPR compliance. We argue that companies have to mobilize technological, human, and intangible resources and build dedicated, enterprise-wide data management capabilities in order to reach their regulatory compliance objective—or, as conceptualized by Sadiq et al. (2007) their “control objectives.” The latter are directly or indirectly related to firm performance, as non-compliance may have significant direct (e.g., fines) and indirect (e.g., reputation loss) financial consequences. Using the RBV also allows extending the regulatory compliance management (RCM) literature (Abdullah et al., 2009; Cleven and Winter, 2009; El Kharbili, 2012), which provides systematic approaches to analyzing regulations, but has not yet embraced the EU-GDPR. Here, capabilities can serve as a way to translate abstract compliance requirements into routines and practices.

More specifically, this paper addresses the following research question: What data management capabilities need to be built to address the EU-GDPR’s requirements? Following a rigorous design science research process (Peffers et al., 2007), we propose a capability model for the EU-GDPR that synthesizes three groups of capabilities—that is, infrastructure, management, and external linkages—that organizations must build to comply with the regulation. The capability model was iteratively developed based on the interpretation of legal texts, findings from EU-GDPR-related publications, and practical insights from focus groups with experts from 22 companies and four EU-GDPR projects¹. We find that capabilities can create “common ground” between legal and IS perspectives: they help analyzing compliance requirements and discussing ways to address them before a decision is made on concrete (technical) implementations.

By providing a comprehensive, theoretically and empirically grounded capability model for the EU-GDPR, our study advances interdisciplinary research at the intersection between IS and law with two types of contributions: First, it contributes to the research on regulations, a topic that has seen few contributions in the IS domain in the past but is enjoying a renewed interest in the context of digitalization and Big Data. Specifically, it connects the nascent research on EU-GDPR to the regulatory compliance management literature (Abdullah et al., 2009; Cleven and Winter, 2009; El Kharbili, 2012). It establishes a link between compliance rules and practice in the spirit of the sense-making dimension of IT-based regulation (de Vaujany et al., 2018). Second, our study complements the fragmented research on the EU-GDPR that treats selected aspects of the regulation or proposes specific implementation solutions by providing an integrated, enterprise-wide perspective. The capability model acts as an overarching framework that outlines the links between compliance requirements, capabilities, and their materialization in the form of resources. It allows researchers to theorize about the capabilities required for EU-GDPR-compliant data management, position and compare their suggestions for EU-GDPR-compliant solutions in the larger context and generalize beyond the EU-GDPR. Practitioners may use the capability model to assess their current status and set up systematic approaches toward compliance with an increasing number of data protection regulations.

The remainder of this paper is structured as follows: We start by introducing the EU-GDPR and providing a synthesis of research on the topic, as well as regulatory compliance in general. After outlining the research methodology and process, we motivate the RBV perspective and present the capability model. We conclude by summarizing our contribution and discussing future research.

Background and related research

Paradigm shift in data protection regulations

In January 2012, the European Commission published a proposal for an overhaul of data protection law, which would become the EU-GDPR². The aim was to remedy the fragmented implementations of the previous Data Protection Directive (95/46/EC) and account for the significant changes introduced by the Internet and digital services (Mitrou, 2017; Nicolaidou and Georgiades, 2017). The EU-GDPR is noteworthy for successfully harmonizing the European data protection legal framework and applies directly in all 27 EU member states, a scale much larger than any previous data protection regulation. Moreover, its relevancy extends beyond Europe: On the one hand, any organization that processes the personal data of an EU citizen must comply with it regardless of the geographical location of their operations. If it fails to do so, significant fines will be imposed (i.e., up to 20 million euros or 4% of an organization’s global revenues, whereas previous regulations averaged at around 500,000 euros). On the other hand, the EU-GDPR has been conceived as a worldwide reference for data protection (von der Leyen, 2020), and the strict European rules have inspired data protection regulations in areas of the world that did not have any, such as the California Privacy Act (CCPA) and India’s Personal Data Protection Bill (Parliament of the Republic of India, 2018). In the United States, there have been official calls to complement domain-specific provisions, such as the Health Insurance Portability and Accountability Act (HIPAA) for health information, with a full-fledged data protection regulation at the federal level, similar to the EU-GDPR (Rubio, 2019). Other countries, such as Japan (Japan Personal Information Protection Commission, 2020) and Switzerland (Swiss Confederation, 2020), have been compelled to update their existing data protection frameworks to match the EU-GDPR’s strengthened requirements (Métille and Raedler, 2017, Tanaka and Kitayama, 2020). These developments are a testimony to the EU-GDPR’s global influence and show that it has become the de facto standard for data protection (De Hert and Malgieri, 2018, Thélisson, 2020).

While these new regulations reinforce established data protection concepts (Debet, 2018, Wiese Schartum, 2018), they also introduce a paradigm shift toward greater choice and sovereignty for individuals and more accountability for organizations (De Hert and Papakonstantinou, 2012, 2016; Mitrou, 2017). Most notably, they strengthen existing transparency mandates—in the case of the EU-GDPR, organizations must inform individuals about data processing in clear language and separately from general conditions. They are also required to present more granular consent options (Nicolaidou and Georgiades, 2017). One of the major additions is the concept of accountability, which implies that organizations must be able to demonstrate compliance with the regulation. They must also appoint data protection officers (DPOs) and announce data breaches to both authorities and individuals. Privacy-by-design principles (i.e., implementing privacy from the ground up in systems and offerings) also appear in the regulation, along with new individual rights, such as data portability and a right to oppose automated decision making (Nicolaidou and Georgiades, 2017).

EU-GDPR and data protection in IS literature

Although the EU-GDPR was finalized in 2016 and entered into force in May 2018, it was slow to attract the attention of IS researchers. This mirrors a general reluctance among IS researchers to probe information privacy (Bélanger and Crossler, 2011). In 2018, a query with the keyword “GDPR” on the AIS Electronic Library only returned 27 matches. This number has increased tenfold in the past two years, resulting in 262 matches at the end of 2020. A more detailed review of these studies reveals four categories of contributions, but only the first two categories treat the EU-GDPR as their central topic of interest (see Table 1):

1. Overall regulation (19 studies; for details, see Table 1): These studies analyze the regulation as a whole. Most popular contributions relate to the EU-GDPR’s impact and mapping of the regulation to existing domain-specific frameworks. This is where we position the study at hand.

2. Selected aspects of the regulation (23 studies; for details, see Table 1): These studies treat the EU-GDPR as the central topic of interest, and their outcomes relate to a specific aspect of the regulation. They address five key themes: data protection rights, consent, transparency, accountability, and technical and organizational measures.

3. Data privacy and security (110 studies): These studies contribute to domains that are related to the EU-GDPR and data protection, such as other regulations, general privacy research, cybersecurity, information ethics, information disclosure, and data sharing. While they position the EU-GDPR as a motivating factor for their outcomes, they do not analyze the regulation.

4. General IS research (94 studies): These studies relate to a variety of other IS research areas and only mention the EU-GDPR to back up a specific, isolated argument.

Table 1.

Overview of EU-GDPR related studies in IS literature.

Study	Type*	Topic area^†	Research focus
Scope: overall regulation
Addis and Kutar (2018)	C	Impact	Impact of EU-GDPR on emerging technologies
Martin and Matt (2018)	E	Impact	Impact of privacy regulations on startup innovation
Pankowska (2018)	C + E	Practices	EU-GDPR mapping to privacy frameworks and awareness
Russell et al. (2018)	C	Impact and practices	Transformation framework for digital privacy
Tona et al. (2018)	C	Tech./Tools	Design of ethical Big Data artifacts
Veiga et al. (2018)	E	Practice	Mapping of data protection regulations and benchmarking of practices
Burmeister et al. (2019)	C	Tech./Tools	Enterprise architecture metamodel for EU-GDPR
Martinez et al. (2019)	C	Impact	Impact of EU-GDPR on smart grid operations
Rösch et al. (2019)	C	Tech./Tools	Translation of legal requirements into technical requirements
Addis and Kutar (2020)	E	Tech./Tools	Data protection challenges arising from AI implementation
Burmeister et al. (2020)	C	Tech./Tools	Enterprise architecture management supporting EU-GDPR implementation
Francis et al. (2020)	C	Practices	Comparison of principles behind privacy frameworks in 14 countries
Grundstrom et al. (2020)	E	Practices	EU-GDPR impact on access to data inside organizations
Houta et al. (2020)	E	Concerns	Analysis of EU-GDPR discourse on social media
Huth, Burmeister (et al. (2020)	E	Practices	Collaboration between legal and enterprise architecture teams during EU-GDPR implementation
Jakobi et al. (2020)	C	Impact	Research contribution regarding conflicting business implications of EU-GDPR implementation
Lindgren (2020)	E	Impact	Impact of EU-GDPR on (multi–)business model innovation
Maunula (2020)	C	Tech./Tools	Technology review for EU-GDPR
Zhang et al. (2020)	E	Concerns	Impact of EU-GDPR on consumer online trust
Scope: data protection rights
Engels (2016)	C	Impact	Impact of data portability right on competition dynamics
Farshid et al. (2019)	C	Tech./Tools	Blockchain prototype for data deletion
Presthus and Sørum (2019)	E	Concerns	Privacy awareness and knowledge of consumers following EU-GDPR
Rieger et al. (2019); Guggenmos et al. (2020)	E	Tech/Tools	Design principles and development of blockchain solution for asylum procedures in Germany
Wohlfarth (2019)	C	Impact	Strategic aspects of data portability
Scope: consent
Bergram et al. (2020)	E	Attitudes	Influence of phrasing and digital nudges on user consent and privacy awareness
Kurtz et al. (2020)	E	Practices	Identification of consent-related issues and design goals
Proferes and Walker (2020)	E	Attitudes	Researchers’ attitudes toward consent in exploiting public data
Scope: transparency requirements
Alboaie (2017)	C	Tech./Tools	Privacy label for GDPR
Diamantopoulou and Mouratidis (2018)	C	Tech./Tools	Reference architecture for privacy-level agreements
Fox et al. (2018)	C	Tech./Tools	Guidelines for compliant privacy notices
Mejtoft et al. (2019)	E	Tech./Tools	Blockchain prototype for increased transparency of data processing
Watson and Nations (2019)	E	Tech./Tools	Identification of factors influencing transparency of algorithms and recommendations
Paul et al. (2020)	E	Impact	Impact of EU-GDPR on user privacy perceptions for wearable IoT devices
Scope: accountability requirements
Karyda and Mitrou (2016)	C	Practices	Information security/incident management
Petkov and Helfert (2017)	E	Practices	Applying data breach notification to past infringements
Kurtz et al. (2018)	E	Practices	Review of third-party data processors
Vemou and Karyda (2018)	C	Practices	Evaluation of privacy impact assessment methods
Kurtz et al. (2019)	E	Practices	Analysis of third-party data processing in service ecosystems
Scope: technical and organizational measures
Huth and Matthes (2019)	C	Tech./Tools	Privacy engineering approaches for software development
Faber et al. (2020)	C	Tech./Tools	Blockchain-based personal data and identity management system
Huth, Both et al. (2020)	C	Tech./Tools	Tool prototype and approach for integrating privacy aspects in agile development methods

*C = conceptual, E = empirical.

^†Based on Bélanger and Crossler (2011).

Hence, despite the increasing interest for EU-GDPR, research is still at an early stage and only the contributions in the first and second categories (i.e., 16% of all the studies) can be considered to fully embrace the EU-GDPR topic. Interestingly, these studies address typical topic areas in Bélanger and Crossler's (2011) taxonomy for information privacy research and are classified accordingly in Table 1. The early EU-GDPR studies (published until 2018) fell within the domains of information privacy practices and information privacy technologies and tools. After the EU-GDPR entered into force, researchers broadened their scope and started to investigate the information privacy concerns and attitudes of individuals and specific stakeholder groups (e.g., software developers, researchers, and business executives). Comparing the state of research in 2018 with 2020, we observe the most significant uptake in studies focused on technologies and tools for EU-GDPR compliance, which now constitute the majority of EU-GDPR-related studies (i.e., 41% in 2020, up from 28% in 2018). Four of these studies (out of 16) investigate blockchain as a technological basis for compliance solutions. By contrast, studies on information privacy practices were predominant in 2018 (i.e., 57%) but now rank second after technology and tools-related research (i.e., down to 31% in 2020). They predominantly comprise empirical studies of EU-GDPR-related practices. Finally, the share of studies classified in the information privacy impact category has dropped slightly from 28% in 2018 to 20% in 2020. These studies investigate the impact of the EU-GDPR on emerging technologies (e.g., advanced analytics and smart products) and business model innovation, as well as the economic/market impact of data portability.

Table 1 also illustrates that research on the EU-GDPR is fragmented and many studies narrowly focus on one of the EU-GDPR’s requirements and the technologies, tools, or practices used to address them. With respect to consent, these studies investigate, for example, the means to influence it on digital channels (Bergram et al., 2020) or whether existing implementations comply with the regulation (Kurtz et al., 2020). Several studies suggest blockchain-based solutions (Faber et al., 2020; Guggenmos et al., 2020; Mejtoft et al., 2019; Rieger et al., 2019) or enterprise architecture (Burmeister et al., 2019, 2020). These approaches have two shortcomings: First, most papers take the compliance requirements for granted and directly look into specific practices or solutions. They thereby do not take into account that the regulation remains abstract and does not prescribe nor endorses concrete implementation options, and that enterprises need to translate it and develop suitable data management practices supporting compliance. Second, these studies address isolated aspects of the regulation, that is, a single regulatory requirement or a limited set thereof, and suggest targeted solutions to address them. Hence, we still lack a broader and implementation-agnostic understanding of the compliance requirements and the ways to address them.

The 19 studies on the overall regulation mostly evaluate existing practices and concerns or analyze the EU-GDPR’s impact on a specific domain (e.g., social media discourse, innovation, and Big Data). Yet, they fail to provide insights into the entire regulation’s implications from an enterprise-wide perspective. Russell et al. (2018) address this topic by proposing a digital-privacy transformation “gap-map” that would measure the organization’s propensity for change. However, it exclusively takes a change management perspective without investigating the compliance requirements and their implications for enterprise data management.

Regulatory compliance management

So far, the academic discussion on the EU-GDPR has not connected with the regulatory compliance management (RCM) research domain, although the latter provides systematic approaches to analyzing regulations and their influence on business practice. Regulatory compliance management aims to “ensur[e] that enterprises are structured and behave in accordance with the regulations that apply, i.e., with the guidelines specified in the regulations” (El Kharbili, 2012). Regulatory compliance management introduces useful definitions to delineate relevant legal concepts: It distinguishes between regulations (i.e., binding document), regulatory guidelines, and compliance requirements, as provided in the legal text. After interpretation, this ultimately results in compliance requirements being implemented (El Kharbili, 2012). The so-called concretized compliance requirement describes the implementation of a CR in an enterprise context, fulfilling its legal specification.

Two papers from 2009 analyze the coverage of RCM in IS research. Cleven and Winter (2009) isolated 26 relevant papers and analyze them through the lens of enterprise architecture. They found that while some RCM aspects have been prominently studied (e.g., organizational and behavioral impacts of regulations, compliance-supporting IT solutions), others have been neglected. Specifically, they found no contributions on the operationalization of compliance objectives. The review by Abdullah et al. (2009) on RCM revolves around the approaches (i.e., explanatory or solution) and context (i.e., region, type, and domain) of the considered contributions. Most of the 45 papers concern North America, and only three of them focus on Europe. Regarding data protection, they identify two papers on Fair Information Practices and only one on the European Data Protection Directive (95/46 EC), even though it had been enforced for more than a decade. Furthermore, all identified contributions offer either preventive or detective solutions, but no corrective solutions. The authors hypothesize that corrective solutions are an outcome of legal analysis, which is why they were not addressed by the IS community.

Hence, there is a lack of RCM-related contributions that address data protection regulations, focus on regions other than North America (Abdullah et al., 2009), or provide guidance to achieve strategic compliance objectives (Cleven and Winter, 2009). This last call is echoed by our literature review on the EU-GDPR—although there have been contributions on the topic, they all focus on specific aspects of the regulation. Thus, we lack a single integrated framework for the EU-GDPR that takes an enterprise-wide perspective on the compliance requirements and analyzes ways to address them without prescribing specific implementation choices.

Research design

Context and research objectives

Our research activities were carried out in a multi-year research program on data management, which followed the consortium research method (Österle and Otto, 2010). This setup provides close collaboration between academics and experts from multinational organizations active in various industries³ and detailed insights into EU-GDPR implementation initiatives. It follows the collaborative practice research tradition and aims to add to the knowledge of involved professional and scientific communities alike, in order to advance practices in the area of interest (Mathiassen, 2002).

Our research objective was to jointly develop prescriptive knowledge in terms of Gregor’s (2006) type V theory that supports companies in achieving EU-GDPR compliance. Accordingly, we adopted design science research (DSR) as our central research paradigm to develop a capability model as an artifact “to solve identified organizational problems” (Hevner et al., 2004) relating to data protection—a highly interdisciplinary topic that is located at the intersection of legal practice and enterprise data management. Capability models are a type of reference models, which build on the RBV as underlying theory and outline the relevant set of capabilities that make up an organization’s ability to “perform a set of coordinated tasks, utilizing organizational resources, for the purposes of achieving a particular end result” (Helfat and Peteraf, 2003). Reference models support the accumulation of knowledge, as they allow to explicate, integrate, and consolidate the fragmented knowledge that is available in the form of situational designs and emerging practices (vom Brocke and Buddendick, 2006). In enterprise data management, capability models have been suggested by academics and professionals to structure and assess data management practices, emphasizing the required capability-building from the deployment of different types of resources (Legner et al., 2020).

Research process

In order to develop the capability model with the due scientific rigor, we followed the research process outlined by Peffers et al. (2007). We initiated our research activities with a problem-based entry point, where objectives and solutions are not yet defined. Following the problem analysis, we developed the capability model in two main phases, each comprising iterative design cycles, as well as demonstration and evaluation steps. We will elaborate on the different steps of our research process in the next sections (see Figure 1 for the overview) and provide evidence on how the capability model developed along the two main phases. The iterative nature of the chosen design science research process allows for the integration of theoretical elements and practitioner feedback. Thus, throughout the research process, we used RBV concepts to theoretically ground and structure our insights from different types of research activities: an analysis of legal texts, official guidelines, and interpretations on the EU-GDPR, a review of EU-GDPR-related tools, and close interactions between academics and practitioners, comprising five focus group meetings with 33 data management experts from 22 companies, as well as insights from four EU-GDPR projects.

Figure 1.

Research process with problem-centered initiation based on Peffers et al. (2007).

Problem identification and definition of objectives

Preparation for the research activities started in early 2017 and reflects the problem-centered initiation of our research process. These activities were meant to understand the problems that EU-GDPR implementation entails and specify the research objectives. In an initial review of the regulation, we extracted the EU-GDPR’s compliance requirements and analyzed them according to foundational data protection principles in legal literature (i.e., personal data, informational self-determination, accountability, and transparency). Early results of this analysis were discussed with practitioners (focus groups 1 and 2) and revealed two main challenges regarding EU-GDPR compliance. First, while anticipating significant changes to the current way of storing and processing personal data on an enterprise-wide level, participants recognized that they lacked a comprehensive understanding of the regulation itself. Second, they cited a lack of common ground with legal departments. In their organizations, discussions around data protection and privacy regulations are often cut short due to a lack of common approaches and vocabulary, which blocks the identification of feasible and compliant solutions and hinders progress. This led to the research objective of defining a capability model for the EU-GDPR that assists data management professionals in understanding and implementing the regulation and collaborating with their colleagues in the legal departments.

Development of the capability model’s first version

From 2017 to 2018, the first version of the capability model (V1) was developed in three iterations involving insights from field projects and parallel research activities to design the capability model, as well as focus groups to collect feedback and additional instantiations to demonstrate the model. At first, we analyzed regulatory requirements in terms of general capabilities that come into play for achieving compliance. Then, we collected field evidence and expert feedback to further refine the sub-capabilities and analyze implementation options with the required technological, human, or intangible resources.

The first design iteration comprised a project at Engger⁴, a global engineering company, and resulted in the initial draft of the capability model, comprising four capabilities and 15 sub-capabilities. Engger had just started a large-scale project around EU-GDPR-compliant personal data aimed at harmonizing business partner data management in a highly distributed landscape with around 500 systems in different countries and subsidiaries. This project helped to get a better understanding of the issues and define capabilities related to the collection and distribution of personal data and consent. The draft version of the capability model was discussed with experts from other companies in focus group 3.

In the second iteration, two focus group meetings (i.e., 4.1 and 4.2) helped clarify the scope of the capability model. It was decided to set aside all security-related considerations and focus exclusively on data management capabilities. In focus group 4.1, the practitioners indicated that security is usually a distinct function and consulted, while the data management aspects were rarely addressed. From an academic perspective, information security is a well-research field, and the existing concepts may be translated to EU-GDPR, whereas there is little coverage of data management practices in regulatory compliance with data protection regulations. We also performed a re-mapping of the capabilities and grouped five capabilities and 17 sub-capabilities into two capability groups based on the demarcation between organizational and system capabilities found in RBV literature (Bharadwaj, 2000, Baiyere and Salmela, 2014).

The third iteration comprised a project around consent management at Allmed, a global pharmaceutical company. Its technical team had designed a minimum viable product solution, which we analyzed based on the capability model. Insights from the project together with a resulted in a stable set of six capabilities, with a new capability addressing data protection rights specifically, and 18 sub-capabilities, organized around two capability groups.

This capability model was subsequently demonstrated and evaluated: It was demonstrated with the EU-GDPR activities at Leares, a small consulting firm, where it proved to be a useful and efficient tool for assessing the current capabilities, identifying the required capabilities, and prioritizing compliance activities. In parallel, we used the capability model to analyze and classify 23 software tools from major vendors claiming to support EU-GDPR compliance (cf. Appendix 1, with tools falling into the common categories of data management, compliance and identity & access management (CIAM), security, and enhancement). This analysis allowed us to further validate that the identified capabilities and sub-capabilities were complete and exhaustive.

We then conducted additional expert interviews to evaluate the artifact’s simplicity, understandability, fidelity, and completeness (evaluation criteria as suggested by Prat et al., 2015). We selected the data protection officer, as well as a data management specialist from two major insurance companies in Switzerland, Versuisse and Svizzance, which are among the country’s Top 10 providers of life and non-life insurance and also operate in EU countries. Interviews consisted of a walkthrough of each capability to discuss and evaluate the company’s standing and practices. At the end of each interview, we asked participants to rate the capability model’s simplicity, understandability, and completeness using a five-point Likert scale (where 1 = fully disagree, 3 = neutral, and 5 = fully agree). Our respondents rated the capability model’s simplicity, understandability, and completeness with a minimum of 4 out of 5. The fidelity dimension was the only one without a rating of 5, as respondents rated it with 3 and 4. Respondents with a legal education indicated that although the capabilities seemed to adequately reflect the EU-GDPR requirements, they were missing assignments of each capability to the regulation’s principles. Similarly, data management expressed that although capabilities matched the requirements that they discussed with members of their organizations’ legal teams, there was a lack of explicit reference to the regulation. Participants in focus group 5 confirmed those results.

Development of the capability model’s second version

While the first version focused strongly on the practical relevance and utility, it had certain shortcomings in terms of documentation and was mainly built from experiences gained in early-stage initiatives. After the EU-GDPR went into effect, we observed an increase in academic studies and more open debates and testimonials from companies related to their implementation approaches and challenges. In 2019, we decided to launch a second phase that would allow us to validate and enhance the capability model based on the insights from the increasing number of EU-GDPR publications, while improving its theoretical grounding. From 2019 to 2021, we conducted two additional design iterations, with subsequent demonstration and summative evaluation, resulting in the second and final version of the capability model (V2).

In parallel, we continued to analyze EU-GDPR-specific legal literature to inform the development of the capability model, ensuring a proper fit with legal requirements. For this purpose, we gathered and analyzed material from authoritative data protection sources, such as textbooks from multiple legal traditions, for example, pan-European (European Union Agency for Fundamental Rights et al., 2018; Synodinou et al., 2017, 2021, 2020; Voigt and Von Dem Bussche, 2017), French (Bensoussan et al., 2018), Belgian (Docquir, 2018), and Swiss (Meier, 2011), as well as two recent doctoral dissertations (Staiger, 2017; Thélisson, 2020). We complemented this understanding with insights from official guidelines and interpretations from supervisory authorities (e.g., Chatellier et al., 2019; Commission Nationale de l’Informatique et des Libertés, n.d.; European Data Protection Board, 2017, 2018a, 2018b; European Data Protection Supervisor, 2018, 2019; Information Commissioner’s Office, 2017), as well as academic papers and doctrinal opinions (e.g., Armingaud and Ligot, 2019; Castets-Renard, 2019; Cheffert, 2018; De Hert and Malgieri, 2018; De Hert and Papakonstantinou, 2012, 2016; Debet, 2018; Fellous-Sigrist, 2018; Groos and Veen, 2020; Hoeren and Kolany-Raiser, 2018; Karjoth and Langheinrich, 2019; Lazaro and Le Métayer, 2015; Naftalski, 2018; Puyraimond, 2019; Rallet et al., 2015; Solove, 2013; Wiese Schartum, 2018; Zanfir, 2014).

In the fourth iteration, we revised the capability model in light of expert feedback and the latest academic and practitioner literature. We monitored EU-GDPR-related studies until Q4 2020, updated the literature review, and integrated insights from selected publications as support for relevant capabilities and the resources deployed in different implementation options. To improve the documentation’s consistency and completeness, we mapped each capability, sub-capability, and software features with relevant EU-GDPR recitals and articles. The combination of practitioner and research insights also enabled us to specify relationships and dependencies between capabilities and isolate enabling ones. This iteration entailed a project at Shippy, a European shipping company, where the model was applied by a consultant that had not been involved in its development.

In the fifth iteration, we integrated academic feedback and reassessed the capability model’s structure based on the existing theoretical framework on IT capabilities. Specifically, we mapped capabilities and sub-capabilities to prominent RBV-based categorizations (following Bharadwaj et al., 1999, and Wade and Hulland, 2004) in order to accurately reflect their characteristics and theoretical underpinnings. Based on this analysis, we combined the capabilities and sub-capabilities dealing with the relationships with external entities and added a dedicated capability group, resulting in three capability groups (with seven capabilities and 18 sub-capabilities).

Finally, we conducted a summative, two-pronged evaluation consisting of an evaluation questionnaire presented to practitioners after a capstone presentation on the research project, as well as a debriefing session to analyze lessons learned from the Shippy project. Through the questionnaire, we evaluated the capability model’s understandability, completeness, consistency, simplicity, usefulness, and applicability by using the same five-point Likert scale as in the first evaluation. All dimensions received ratings of 4 and above, except for simplicity and applicability, which one of the respondents rated as “neutral” (3 points). The debriefing of the Shippy project confirmed that the capability model creates common ground between legal and data management practice and helps data experts understand the regulation (quotations from the interview can be found in Appendix 3). Regarding the model’s ability to support (i) assessments and roadmap planning, (ii) progress monitoring, and (iii) communication and change management, all dimensions received a minimum rating of 4 (on four counts), and the majority received a rating of 5 (on six counts).

Data management capabilities for the EU-GDPR

Capability model: Theoretical foundations

According to EU-GDPR art. 24 § 1, an organization is responsible for implementing “appropriate technical and organizational measures to ensure and be able to demonstrate that processing is performed in accordance with this Regulation.” Using the RBV as theoretical lens, we argue that achieving compliance with EU-GDPR at an enterprise level requires building dedicated capabilities for processing and storing personal data. Capabilities are “complex patterns of coordination between people and between people and other resources” (Grant, 1991) that are embedded in organizational practices and individual skills (Bharadwaj et al., 1999). In the data protection domain, building these capabilities requires an organization to deploy three types of resources in predictable patterns of activity (Barney, 2001, Bharadwaj et al., 1999):

- Human resources taking over relevant roles for data protection, for example, a data protection officer, a contact person for data rights requests, or an enterprise data architect.

- Technological resources comprising physical IT assets (hardware, software, and databases) that enable data protection compliance, for example, a data processing system, a dedicated consent-management tool, and a self-service portal for data rights requests.

- Intangible resources representing the data protection-related know how, for example, frameworks, standards, process models, as well as data and enterprise architecture documentation.

While the RBV considers firm performance through sustainable competitive advantage (Barney, 1991) as the goal of capabilities (the why), we argue that capabilities for data protection are built with a regulatory compliance objective or, as conceptualized by Sadiq et al. (2007), to reach an organization’s “control objectives ” (see Figure 2). As the business impact of compliance activities is hard to quantify in financial terms, they are seldom cited as a source of competitive advantage. However, non-compliance may have significant direct (e.g., fines) and indirect (e.g., reputation loss) impact on a company’s ability to generate profit, thus impacting its performance. This is also supported by IS studies that have investigated the links between governance and compliance on the one hand, and business value and organizational success on the other hand (Buchwald et al., 2014; Heier et al., 2007; Ritschel et al., 2005). Hence, we argue that such “control objectives” can be viewed alongside competitive advantages as components of firm performance. Therefore, based on Zhang et al.’s (2013) definition of an IT capability, we define data management capabilities for regulatory compliance as a firm’s ability to acquire, deploy, and leverage its technological, human, and intangible resources in combination with other resources and capabilities to achieve an organization’s control objectives related to the relevant data protection regulations.

Figure 2.

Data management capabilities for regulatory compliance from the lens of RBV and RCM.

The RBV also helps extending RCM concepts (El Kharbili, 2012) for the new generation of data protection regulations: Capabilities connect the normative aspects of the regulation (i.e., the regulatory guidelines and compliance requirements or CRs) and the concretized compliance requirements (CCRs), that is, the concrete implementation of a CR using technological, human and intangible resources. Table 2 depicts this connection and illustrates it for the EU-GDPR. The addition of data management capabilities to existing RCM concepts qualifies the interpretation of CRs and enables their translation into what organizations should do (i.e., the capabilities) as opposed to how they should do it (i.e., the specific resources implemented and used for achieving compliance with the regulation). In doing so, capabilities create “common ground” between legal and IS perspectives and help analyzing compliance requirements in terms of changes to the existing routines and practices before a decision is made on concrete (technical) implementations.

Table 2.

Capabilities as link between compliance requirements and their concretization.

RCM concept	Definition (based on El Kharbili, 2012)	Illustration in EU-GDPR
Regulatory guideline	Stipulates a set of obligations to comply with.	Art. 6—“Lawfulness of processing”: enumerates conditions in which data processing is legal.
Compliance requirement (CR)	Pieces of text extracted from the regulatory guideline specifying an expected behavior or a specific condition to fulfill.	Extraction of requirements bearing data management relevance. For example, art. 6 § 1 a and art. 7 § 1 require that data be processed according to individuals’ expressed consent.
Data management capabilities*	Result of the interpretation of CRs in terms of capabilities that are to be implemented or improved.	Manage consent and sub-capabilities: implement consent items, collect consent instances, distribute consent, enforce consent-based processing.
Concretized compliance requirement (CCR)	Implementation of a CR in an enterprise context, fulfilling its legal specification, by a concrete set of technological, human and intangible resources.	A concrete measure is implemented in a specific organization to operationalize CRs. For example, “In company X, existing IS resources, such as a CRM system, must be configured so that consent data is first recorded in system 1 and pushed to other systems every 12 hours.”

*extension suggested in this study.

Capability model: Structure and overview

Building on these foundations, we derived the capability model from EU-GDPR’s underlying principles, as described in the legal literature, as compliance requirements. The capability model (see Table 3) comprises 7 capabilities, which reflect the “pillars” of the regulation, and 18 sub-capabilities, which were iteratively developed in our research process and integrate academic and practitioner knowledge. The EU-GDPR-related capabilities were grouped in three main capability groups based on the reference capability conceptualization of Bharadwaj et al. (1999) and Wade and Hulland (2004). Consequently, Infrastructure Capabilities are mainly concerned with the ability to implement the new data-related rights and consent-based processing in the data processing systems, Management Capabilities predominantly ensure EU-GDPR’s accountability requirements (although they can still be supported by tools), and External Linkages capabilities relate to interactions and collaborations with entities outside the organization (e.g., data subjects, authorities, and other organizations).

Table 3.

Capability model for EU-GDPR.

(A) Infrastructure Capabilities
(A1) Protected data scope definition	(A1.1) Identify data objects	(A1.2) Classify data attributes	(A1.3) Locate data records
(A2) Consent processing	(A2.1) Implement items of consent	(A2.2) Collect instances of consent	(A2.3) Distribute consent	(A2.4) Enforce consent-based processing
(A3) Personal data removal	(A3.1) Delete data	(A3.2) Pseudonymize data
(B) Management Capabilities
(B1) Data protection orchestration	(B1.1) Assume data protection responsibilities	(B1.2) Oversee data protection activities
(B2) Data protection evaluation and control	(B2.1) Maintain records of processing activities	(B2.2) Maintain documentation of system landscape	(B2.3) Supervise sensitive processing activities
(C) External Linkages
(C1) Data protection communication	(C1.1) Disclose information to individuals	(C1.2) Transmit data in standardized form
(C2) Compliant processing demonstration	(C2.1) Control compliance of external processors	(C2.2) Cooperate with authorities

In the following sections, we present each of the identified capabilities, along with the compliance requirements, the empirical insights from focus groups and projects, and the sub-capabilities. We also graphically depict the dependencies with other sub-capabilities (see Figures 3–10) and discuss the required resources for their implementation. For each capability, we also provide a synthesis (in Tables 4–10) with details about the sub-capabilities, their specification, implementation options (with exemplary resources) and the relevant compliance requirements (CR, extracted from the regulation).

Figure 3.

Capability relationships: Protected data scope definition (A1).⁵

Figure 4.

Capability relationships: Consent processing (A2).

Figure 5.

Capability relationships: Personal data removal (A3).

Figure 6.

Capability relationships: Data protection orchestration (B1).

Figure 7.

Capability relationships: Data protection evaluation and control (B2).

Figure 8.

Capability relationships: External data communication (C1).

Figure 9.

Capability relationships: Compliant processing demonstration (C2).

Figure 10.

Capability network (see Appendix 4).

Table 4.

Capability overview: Protected data scope definition (A1).