Abstract
The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies being unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie “disappears”, and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific Internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation approach and tool help to increase users’ awareness of cookies and data sharing.
Introduction
Cookies have emerged as one of the most convenient and common mediums to request consent for data sharing online [55]. The rising digitisation of services in the e-commerce, healthcare, finance and social media domains have also turned cookies into a valuable source of personal data such as IP address and browsing behaviour. Cookies are a promise for better user experience online as the data that they collect is often used for user profiling to create personalised browsing experiences and e-commerce recommendations. However, this is often done at the cost of an individual’s privacy [27,41]. Identifying who benefits more from cookies, individuals sharing their data or companies that use it, has also become a challenge. The European Union’s (EU) GDPR [17], in effect since May 2018, has highlighted the importance of consent and has set it as one of its legal basis for personal data processing. Consent must be freely given, specific, informed, unambiguous and one should be able to revoke it with the same ease it was given (Art. 4 (11)). As shown in [4,32], however, requesting and receiving informed consent does not equal individuals being truly aware of the implied data sharing. Requesting informed consent prior to any data processing can cause consent fatigue [57] and information overload [26], which often lead to blindly given consent [3]. Several factors, namely the lack of knowledge of what cookies are and of their functions [36], the lack of control that users have over the data that is collected and shared through cookies and the lack of feedback provided by browsers’ cookie management facilities [24] contribute to this.
Despite GDPR’s requirements for informed consent, many cookies are still not compliant with regard to the information (or lack thereof) that they present to individuals and how they are imposed [35]. Further, once consent is granted, the cookie dialogue disappears (i.e., it is no longer visible while browsing) and data sharing begins on the back-end. This also poses a challenge for revoking consent (Art 7(3)) as there is no cookie dialogue via which individuals can exercise their right to personal data erasure (Art. 17(2)) [50]. Many online service providers can benefit from the disappearing cookie dialogues as without a reminder cookies are likely to be forgotten by individuals. Retrieving cookie logs, which store specific data about the consent, data processing and the cookie’s duration, can be a complex task for individuals with no prior privacy or technical experience (i.e., cookies logs are described using privacy and security terminology). Several solutions, in the form of browser extensions such as the Cookie Editor1
There is a need for greater clarity, transparency and awareness about cookies and the data sharing that happens behind the scenes. Both the design of cookie dialogues and the triggered data sharing need to comply with GDPR (when the individual is an EU citizen). Achieving this, however, is challenging as user interface (UI) and user experience (UX) designers might not have the same legal expertise as a data protection practitioner. A single consistent schema outlining all the information that cookies need to present to individuals based on the GDPR can be used to harmonise the domain experts’ work. Further, such schema can help to ensure legal compliance and can bring more transparency into cookie-based data sharing. This can be achieved with Semantic Web technologies such as ontologies, which represent a domain in a machine-interpretable format and can be used as a schema for knowledge graphs (KGs) to further interlink multiple domains [20]. KGs also support data interoperability, transparency and traceability [13,21] and are extendable by design, which makes them suitable for use in different ecosystems and across multiple use cases. In the security and privacy domains, KGs have been successfully utilised for privacy-enabled penalisation on the web [25], intelligent decision-making, fraud detection, prediction and tracing of cyber attacks (see [28,42,44,60]). Other domains such as manufacturing (e.g., [9]) and logarithmic law (e.g., [10,54]) have also significantly benefited from utilising KGs to bridge knowledge silos, semantically enrich data, highlight data dependencies and discover insights and new knowledge. Multiple ontologies for data sharing such as Consent and Data Management Model (CDMM) [19], Data Protection Vocabulary (DPV)3
Motivated by this and by building upon the findings in [4,46] that highlight the need for greater online data sharing transparency and interpretability, we present the OntoCookie4
A novel, open-access documented OWL ontology for cookies (referred to as OntoCookie4).
Cookie insights derived from the analysis of the KG with cookies of 40 users.
A novel, open-access tool for personalised cookie visualisations that has been evaluated with 40 users.∗ Currently, the online tool is functional only when clicking “No. I disagree” to the consent banner.
The rest of the paper is structured as follows. Section 2 presents an overview of related work relevant to our study. Section 3 outlines our approach and the followed methodology. The implementation of our work is presented in Section 4, while its evaluation and its results are presented in Sections 5.1 and 5.2 respectively. A summary and discussion on results is presented in Section 5.3. Conclusion and future work are presented in Section 6.
This section presents related work on cookies as an online medium for consent from the privacy, visualisation and Semantic Web fields that helped to motivate our work.
Cookies and privacy
For many years, cookies have been viewed as a privacy-preserving mechanism [30]. However, the enforcement of the GDPR and its requirements for the lawful processing of personal data have highlighted the numerous privacy risks associated with them. According to Article 4(1) and Recital 30 of the GDPR, cookies and specifically cookie identifiers are viewed as personal data, which needs to be handled in compliance with the law. This also includes the need for informed consent request for each cookie. Santos et al. [51] present an in-depth analysis of how data-sharing information is presented via cookie consent dialogues. Following the legal requirements of the ePrivacy Directive (ePD) [18] and the GDPR, around 400 cookie banners presented on the most popular English-speaking websites were manually annotated. 89% of the cookie banners violated the applicable laws. More specifically, 61% of the banners violated the purpose specificity requirement by mentioning vague purposes, including “user experience enhancement” while further, 30% of banners used positive framing, breaching the freely given and informed consent requirements. In a similar study, Soe et al. [52] analysed 300 data collection consent notices from news outlets, which were built to ensure GDPR compliance. The analysis uncovered the use of a variety of dark patterns (i.e., deceptive design practices aimed at manipulating users’ actions) [22,34].
Sanchez-Rola et al. [49] explore users’ perception and reaction to cookie dialogues and conclude that users view cookie dialogues as an annoyance during their browsing time rather than an informative source. Although the users claimed to have privacy concerns regarding cookies and how they collect data, the study showed that the cookie disclaimers did not play a significant role in the users’ decision to continue navigating the website. Greater importance was given to factors such as the reputation of the website, which can also affect the users’ trust in its services [12,23].
In a similar study, Bechmann [3] shows that there exists a non-informed consent culture among social media platform users and that although none of the participants of the study had read the privacy policies, all have given consent. Joergensen et al. [29] further confirm that users rarely read the presented data-sharing terms and conditions before granting consent. Furthermore, statistics from countries within and outside the EU show that most users of social networking sites do not read the privacy policies of the sites or the third-party applications that use their data [3]. The studies confirm that for users, giving consent (in any form such as cookies) and being aware of what the action implies are often mutually exclusive [3,29,49].
Cookie visualisation
The lack of transparency about what accepting a cookie implies and the lack of accessible information about it further contribute to blindly given consent online. Ware [58], Rossi et al. [47] and Drozd et al. [16] highlight the importance of visualisation as a way to support the comprehension of the information that is being communicated to the end user. According to [58], the highest bandwidth channel of communication between humans and machines is provided by visual displays. The amount of information that can be transmitted makes data visualisation a highly appropriate method to communicate information to users.
Rossi et al. [47] emphasize the fact that the use of visualisation is explicitly suggested by the European Union (EU) in legislations such as the GDPR (Rec. 58, Art. 12(7)) as a way to improve comprehension of the information provided to data subjects. One can acknowledge that visual elements and visualisations in general play a crucial role in obtaining informed consent. In recent years there has been a rise in the attempts to build applications that provide more transparency regarding personal data processing through applying different visualisation approaches.
Steichen et al. [53] go deeper into the topic of information visualisation by taking into consideration the role of the individual cognitive style of the users in their ability to perceive the information being communicated in a visual form. Results show that the individual cognitive style plays a significant role in tasks related to information visualisation in general. Findings of the presented work also provide motivation for the development of personalised information visualisation systems based on the cognitive style of the individual users.
In this context, Drozd et al. [16] present the CoRe [15] and the Consent reqUest useR intErface (CURE) user interfaces (UIs), which have the main goal of easing the process of granting consent and providing more transparency into data sharing. The evaluation of the two UIs showed that, indeed, visualisations helped raise awareness of what consent is. However, issues such as information overload due to design complexity were still present. Similar solutions for consent visualisation include the work in [1] and [45], which focuses on raising data-sharing awareness with visualisations. All these studies show that there is a prominent need for consent solutions that support higher levels of transparency, focus on the needs of the users and on raising awareness regarding data sharing.
Cookies and the Semantic Web
One of the earliest and few studies on cookies through the lens of semantics is presented by Cox et al. [11]. The authors explore the application of the Semantic Web in the privacy field and propose an approach for enriching cookies with Resource Description Framework (RDF) fragments. The main goal of the created semantic cookies is to ease access to web services and give users full control over their data online while widening their participation in the Semantic Web. The study has shown promising results and highlights the benefits of machine-readable cookies for persistence stores to simplify access to services. Cox et al. are one of the first to discuss the use of an ontology as a tool that can align the various representations of cookies online, which can also support legal compliance. However, we were not able to identify any such existing publicly available ontology. Through the years most of the work has focused predominantly on consent. A systematic analysis of semantic models for consent and semantic-based visualisations tools that supports users’ comprehension of consent is presented in [33].
A more recent work that addressed cookies and online data-sharing privacy policies is presented in [2]. Audich et al. [2] propose an ontology for privacy policies, which also includes the concept of a cookie and combines it with natural language processing. The main goal of the approach is to improve the readability of online policies by identifying the key information in a policy for individuals to focus on. Cookies and instances such as do-not-track and web beacons (types of cookies) have been semantically represented as keywords that can be found within policy documents or cookie policies. The results of the study have proven the benefits of utilising an ontology (i.e., helping to align and simplify the complex and diverse legalese that is used) for text mining of privacy policies [2]. The study focused on privacy policies in general and legal terminology used by the Federal Trade Commission.7
Motivated by the lack of consent interoperability and transparency regarding data sharing online, Bless et al. [4] utilise both semantics (i.e., ontologies and KGs) and visualisations as key tools to support individuals’ comprehension of consent for data sharing. In comparison to the work in [15] and [16], the authors focus on visualising data-sharing flows after consent is given. The developed visualisation is based on a KG, which stores informed consent information in a GDPR-compliant manner and has helped to raise individuals’ awareness of data sharing significantly. Further, as shown in [10], the later version of the consent KG has also been successfully utilised for performing GDPR compliance verification and in supporting humans and machines in making sense of consent [32].
The use of ontologies for consent and GDPR compliance is also prominent in the work of Kirrane et al. [31], that shows success in utilisation of semantics to build more accurate models to detect security issues. Moreover, the meaningful interpretation of personal data that is exchanged between users and other entities on the web can be used to empower users to have better control over these interactions and therefore improve the way they manage their online privacy. The semantic approach can also bring advantages to companies through automation, which is enabled by the semantic machine-readable and machine-processable representation of data-related privacy policies. The main trends for utilisation of KGs in the security and privacy domains are further discussed in [10,31]. The benefits of semantics in the legal domain, especially for improving consent interoperability, are also discussed in [8,20,33,59].
Rasmusen et al. [46] present a KG-based interface that visualises consent request and utilises gamification to raise user engagement in data sharing. The main goal of the approach is to improve individual’s awareness of consent and the implications that follow in the context of automotive data. The UI presented follows an ontology that models GDPR knowledge about consent. Results from the user study conducted with participants that interact with the tool show that the UI helped raise the individuals awareness and willingness to consent.
The GDPR has set out specific requirements for requesting consent in an informed way through any medium including cookie dialogues. However, research has shown that there is a lack of standardisation with regard to the design of cookie dialogues and the information presented on them. There is currently a misalignment between law, technology and design when it comes to cookies and the underlying personal data that is collected and shared via them. The proposed work in [11] and [2] has called attention to the benefits of semantics in the privacy field concerning cookies. To our knowledge, there is currently no publicly available vocabulary or ontology that can align the knowledge spread across these domains in the context of GDPR. Cookies have become the go-to tool for many service providers when it comes to personal data collection online. Although this has raised privacy concerns due to the lack of transparency of cookie-based data collection and sharing, there is a lack of user-centered tools that support the comprehension of what cookies are and the implications of giving consent for them. To summarise, based on our research of related work, two main challenges have become evident – the lack of shared vocabulary of the cookie domain that can support knowledge exchange and data interoperability for legal compliance and the lack of support for users in making sense of cookie-based data sharing.
Selected approach and methodology
We approach the issue of web cookie comprehension and cookie data sharing from both human and machine perspectives. However, both sides have different comprehension needs that need to be addressed. On the human side, we focus on utilising data visualisations in graphical and tabular forms. Our cookie visualisation tool provides individuals with an interface that takes as input cookie logs and displays personalised statistics that are aimed at providing more transparency into cookie-based data sharing. Consequently, this can help raise individuals’ awareness regarding the implications of granting consent for cookies. The OntoCookie4,5 ontology and the KG built with it are used to represent the cookie data in a meaningful machine-readable and interoperable way. Our approach is motivated by the increase of cookie and consent requests online after the acceptance of the GDPR and tries to bridge the gap between the Semantic Web, privacy and legal domains.

Methodology overview.
The methodology (Fig. 1) followed for the development of our cookie visualisation tool is inspired by the design thinking process [6], which is a solution-based approach to solving problems by considering human needs. The development process consisted of the following stages: emphasise, define, ideate, prototype and test. The first stage was to understand the problem of cookies and consent comprehension. This included research on the privacy domain and, more specifically, on cookies and how data and consent are handled by browsers. Existing work on cookies (with and without the use of semantics) was also considered (see Section 2). During the second stage, the main research problem was defined and system requirements were derived. The third stage focused on analysing the requirements and generating ideas for the design of the tool. The fourth stage focused on prototyping the solution. This was done in several stages as well. We started with (i) building the OntoCookie ontology, (ii) building a prototype UI for cookie import, (iii) implementing functionalities such as cookie annotation, (iv) building the cookie KG and finally (v) visualising different cookie statistics on the UI. The fifth stage consisted of the usability and design evaluation of the tool with users, analysis of the results and the comparison to existing cookie solutions. Our cookie visualisation tool was built with the Flutter8
This section presents details regarding the implementation of the proposed KG-based tool for cookie visualisations. Section 4.1 presents an overview of the OntoCookie ontology for cookies, which has been built and utilised during the study. Section 4.2 presents the two possible action flows of using our tool, while Section 4.3 presents the implementation details of the visualisation.
OntoCookie: A domain ontology for cookies
The OntoCookie4,5 ontology (Fig. 2) is a formal representation of the cookie domain in the context of GDPR. The ontology was built as a response to the lack of openly available semantic models for cookies and the need for cookie consent compliance (from a design and implementation perspective). By following a top-down ontology engineering approach (see [37]), the main classes, sub-classes the relationships between them and their data properties were defined. When defining the subclasses, an “isA” constraint was followed (e.g.,
The class

The OntoCookie ontology.
In order to adhere to GDPR regulations, the users are asked for their informed consent (i.e., users are explicitly asked whether they want their cookie data saved in the KG via a consent dialog). If consent is given, the action flow consists of 11 steps (numbered from 1 to 11 in Fig. 3). In step

The action flow of the application.
In the case of consent, the action flow continues to step
In case the users do not consent to have the cookies stored in the KG, the data will not be annotated to the KG. Users can import the cookies collected into the designated text field created for this purpose. After deciding not to consent (i.e., users have decided not to save the cookie data consumed in the KG), the data will be locally processed and accordingly visualised. In this case, cookie data imported through the Cookie Editor extension will be deleted once the application window is closed. The non-consent action flow follows steps

Main input page of the cookie visualisation tool.
The UI is organised in two parts. The first part is a general guide of six steps on how to use the visualisation tool (Fig. 4). Step
Consequently, the second part of the UI displays all the cookie data, except the stored value, retrieved from the browser extension (Fig. 5). Here, the information is divided into four segments. Segment
To build the back-end, we used NodeJS and also Express for the routing. This has made the creation of our application programming interface easy to use. As mentioned in previous sections, we have created a KG in order to save the information on cookies and their relations with each other. Our KG is contained in GraphDB, a graph database for KGs in RDF. The back-end is connected to GraphDB using the

Overview of the cookie visualisation statistics.
In order to get the time of execution of these SPARQL queries,17
This section presents details about the evaluation of the presented cookie visualisation tool, namely the evaluation set up (Section 5.1), evaluation results (Section 5.2) and a summary of these results (Section 5.3).
Evaluation set up
To evaluate our solution, its usability and design, three questionnaires (on demographics, expectation, and realisation) were created using Google Forms. The evaluation was done in seven stages (Fig. 6). First, the participants were asked to generate a unique ID using the SHA118

Evaluation flow.
For the evaluation, 40 participants (25 male and 15 female) took part in the survey. The age of the participants varied between 18–35 years old where 92.5% were within the range of 18–30 years old and 7.5% were within the range of 30–35 years old. The participants were selected from different backgrounds (computer science students, non-computer science students, researchers, computer-science experts, non-computer science experts) and were based in different countries in Europe (namely, Austria, the Netherlands, Bulgaria and Albania). They were recruited via university network and personal connections. Out of the 40 participants, 20% acknowledged that their highest level of education completed was a high school degree. For 57.5%, the highest level of education obtained was a bachelor’s degree and for 22.5%, the highest level of education obtained was a master’s degree. 30% of the participants declared a very high-level Internet surfing competency, 47.5% a high level of competency, 20% declared an average Internet surfing level competency and 2.5% declared a low level of Internet surfing competency. 65% of our participants spend more than 4 hours per day on the Internet, 25% spend 3–4 hours per day and 10% spend 1–2 hours per day on the Internet.
Expectation vs. realisation
In order to measure the level of comprehension of the users in regards to the cookies collected during the browsing time of the four websites, we at first asked them about their expectation (i.e., how the users expected the results to be before using the application) and compared them with the personalised data (i.e., the factual results) which were visualised by the application. For this purpose, questions related to the amount, duration and source of the cookies collected were asked.
More specifically, to the question:

Survey results for the amount of cookies collected by each website.
To the question:
When asked:

Survey results for the duration of cookies collected by each website.
To the question:
The claim that the users’ knowledge on cookie data is vague and insufficient was further strengthened by the significant differences detected between expectation and realisation, with respect to the total amount of cookies collected and their duration. More precisely, on average, the participants expected the total number of cookies collected during the two minutes of website browsing to be 267.4. Results from the realisation survey showed that, on average, a total amount of 70.8 cookies were collected during their surfing time, approximately 73% less than the users’ expectation (Fig. 9(a)). Regarding the duration of cookies collected, when asked:
The question:

Comparison of expectation vs. realisation averages.

Comparison of results regarding the carefulness with which the participants read and will read the cookie notification banner before and after using the application.
Furthermore, participants answered a set of questions related to their general feeling about cookie data privacy after using the application and also how they will approach Internet cookies in the future. The participants had the perception that cookies were intrusive to their online privacy. Specifically, to the question:
Results showed that participants would embrace an overview tool to manage their cookies. Precisely, the question
Summary of the results
Evaluation results confirm that users (even proficient web surfers) lack detailed knowledge about cookies and the consequences of granting consent for them. For example, the duration of the cookies being stored, the amount of cookies collected during the browsing time and practices of different websites with regard to the cookies they use, commonly do not match the users’ expectations. The results also showed that the cookie visualisation tool presented helped to improve users’ comprehension of cookies and has raised awareness regarding data sharing on the web. More specifically, after being presented with the application, an increase of 47.5% in the users’ willingness to be more cautious when reading the cookie consent banner before giving consent was noticed. The outcome of the evaluation also confirms that users are ready to embrace an overview tool that helps them manage their cookies. 72.5% of the participants agreed that they would feel more confident about their privacy on the web if they were given such overview tool and 95% of the users admitted that they would feel more knowledgeable about cookies if an overview tool to manage cookies was at their disposal. In addition, we believe that this work helps breach the gap between the Semantic Web and the security and privacy domains.
Table 1 describes how our work compares with the existing work in this field in several aspects. It contains information on results, consent request medium, use of semantics, whether the work focuses on before or after consent is given, whether it includes usage of cookies, and lastly, limitations. Results from the table confirm that there is currently a lack of semantic approaches that describe online cookies in order to enhance the users’ understanding of the cookie data they consume on a personal level while surfing the Internet.
Comparison of existing solutions for online consent request
Comparison of existing solutions for online consent request
In this paper, we presented an ontology4,5 for cookies and a KG-based tool6 for cookie information visualisation. The main goal of our solution focuses on easing users’ comprehension of cookies and on raising awareness of cookie-based data sharing. The conducted user evaluation has shown that our approach to semantically representing and visualising cookies helps individuals understand the real nature of web cookies. In addition to empowering users with regard to their personal data sharing, we believe that this work helps to breach the gap between the Semantic Web and privacy domains with the help of the proposed cookie ontology.
Although the challenges of preserving individuals’ privacy online and ensuring legally compliant cookie-based data sharing are far from being resolved, the rising interdisciplinary research between the legal, privacy and Semantic Web domains has already shown promising results. Ontologies such as ours can help to establish a reference model that eases domain experts’ collaboration and semantically enriches the privacy-enhancing technologies and machine learning-based GDPR violation detection tools such as [5] that are being developed. Technologies such as SOLID [48] have been built to give individuals control over their personal data sharing online. While focusing on decentralisation of data, SOLID can still benefit from semantically representing cookies as a medium to request and receive consent online and can utilise the visualisation approach presented in this paper to support individuals’ comprehension of data sharing. The results of the user evaluation have shown that individuals have limited and unclear understanding of the personal data about them that is collected and shared through cookies. However, the evaluation has also shown that our knowledge graph-based visualisation approach improves users’ knowledge about cookies, privacy and the data sharing online.
Currently, our cookie visualisation tool is dependent on the Cookie Editor1 extension and the information captured by it. Our future goal is to remove this dependency by extending the functionalities of our cookie visualisation tool (i.e., implement a cookie capture functionality). Another possible future direction is to extend the use case of our application such that not only it serves as a tool to communicate information, but it also allows users to act on it by offering them the possibility to manage cookies. On the semantic side, we have presented a novel ontology for cookies that can be extended for different domains and use cases. We believe that its reuse and extension will inspire further collaboration between semantic and privacy experts. The uses of the KG for detecting security breaches and data-sharing patterns (within the cookies) can be explored as well.
Footnotes
Acknowledgements
This research is supported by the CampaNeo project funded by FFG (grant 873839) as well as the smashHit EU project funded under Horizon 2020 (grant 871477). We would like to thank Harshvardhan J. Pandit for sharing helpful insights on cookies, consent and GDPR.
