Abstract
The Global Alliance for Genomics and Health is an international consortium that is developing the Data Use Ontology (DUO) as a standard providing machine-readable codes for automation in data discovery and responsible sharing of genomics data. DUO concepts, which are encoded using OWL, only contain the textual descriptions of the conditions for data use they represent, and do not specify the intended permissions, prohibitions, and obligations explicitly – which limits their usefulness. We present an exploration of how the Open Digital Rights Language (ODRL) can be used to explicitly represent the information inherent in DUO concepts to create policies that are then used to represent conditions under which datasets are available for use, conditions in requests to use them, and to generate agreements based on a compatibility matching between the two. We also address a current limitation of DUO regarding specifying information relevant to privacy and data protection law by using the Data Privacy Vocabulary (DPV) which supports expressing legal concepts in a jurisdiction-agnostic manner as well as for specific laws like the GDPR. Our work supports the existing socio-technical governance processes involving use of DUO by providing a complementary rather than replacement approach. To support this and improve DUO, we provide a description of how our system can be deployed with a proof of concept demonstration that uses ODRL rules for all DUO concepts, and uses them to generate agreements through matching of requests to data offers. All resources described in this article are available at:
Introduction
Background & motivation
The sharing of health-related data holds great promise for enhancing research and applying advanced computational and statistical techniques for progress in medicine. At the same time, such sharing and use of health-related data is required to be regulated at legal and institutional levels given its sensitive nature and the ability to have significant impacts. The current landscape consists of institutions such as hospitals assessing each data use request through a dedicated committee that is responsible for the evaluation and decision-making regarding the release of data under their custody. To assist in this process, the Global Alliance for Genomics and Health1
DUO is an OWL ontology based on (and part of) Open Biological and Biomedical Ontology3
DUO concepts specify the DULs as human-readable text within their description (using the
Our argument is that true machine-readability requires the information intended to be conveyed through DUO concepts about the specific permissions, prohibitions, constraints, requirements, and so on to be (also) represented as machine-readable rules that utilise semantic concepts. With this, the DULs inherent in the descriptions of each DUO concept are made explicit through formal representation as a set of rules that can be attached and used alongside the data as a sticky policy.
For assessing whether a data use request is compatible with the dataset DULs, both data provider’s and requestor’s conditions for data use are expressed as policies, and are compared to evaluate whether the intended use is permissible. While DUO is already being used in this manner, such as within the Data Use Oversight System5
More importantly, to automate this process, a set of requirements should be taken into consideration when choosing a vocabulary to express dataset usage conditions, such as: (i) the expressiveness for defining specifics of rules and policies, i.e., the specification of actions, purposes, or other constraints as concepts that can be independently expressed and assessed, and their combinations to represent different categories of policies; (ii) the ability to associate and check their conformance and compliance with legal requirements; and (iii) the ability to specify requirements in machine-readable form and use them to assess correctness and completeness of information. Such solutions have existed for a while now – for example, Answer Set Programming (ASP) and logic-based semantic reasoners have been utilised in a variety of domains – including for representing information and using it for checking legal compliance for GDPR (see Section 2.2).
With the above motivation, we present an approach for representing the inherent information and rules in DUO concepts explicitly in RDF through use of the Open Digital Rights Language6
Implementation of a ODRL validator using SHACL available at
ODRL Formal Semantics CG report available at
Note: The author (Beatriz Esteves) is a contributing member of the ODRL CG’s work on the development of a formal semantics specification.
ODRL Profile Best Practices CG report available at
In addition to these, we also consider ODRL the most suitable candidate for representing DUO concepts as it can be used without requiring any of the existing DUO-based data use or request governance processes to make radical and incompatible changes. That is, the existing practices and processes by which DUO codes are added as annotations to datasets and are used to request access to them can continue without hindrance, and DUO stakeholders can choose which aspects of our ODRL solution they want to adopt within their practices.
ODRL, by modelling terms regarding rights and licensing, also offers a compatible segue for DUO to be linked with relevant legal concepts, for which we use the Data Privacy Vocabulary11
Note: Both authors are active contributing members to DPV.
The contributions of this work are summarised through the following research objectives:
Specifying DUO concepts and conditions for data use as machine-readable policies using ODRL
Developing an algorithm for consolidating data use conditions into a single ODRL policy
Developing an algorithm for identifying compatible datasets with data use requests based on ODRL policies
Enabling expression of legal concepts and restrictions with(in) ODRL policies for DUO concepts using DPV
Elucidating relevance of DUO concepts and associated ODRL+DPV policies for GDPR obligations
In addition to these, a late contribution is the preliminary analysis of two articles providing improvements to DUO that were published while this article was under review. We provide a summary of these recent developments, compare it with the work presented in this paper, and discuss the continued relevance and benefits of our contributions.
The rest of this article presents: an overview of DUO and its applications in Section 2.1, relevant work in state of the art regarding machine-readable policies for GDPR in Section 2.2, our use of ODRL to represent DUO concepts and perform matching with requests in Section 3, expression of legal concepts using DPV in Section 4, a demonstration through proof-of-concept in Section 5, a discussion on integrating this work into existing DUO-based workflows in Section 6, the late contribution containing analysis of recent work in Section 7, and concluding statements in Section 8.
Data Use Ontology (DUO) and aligned efforts
DUO concepts are structured across three taxonomies. The Data Use Permission taxonomy, with base class
DUO is the result of earlier efforts to create codes regarding data use, and use them as machine-readable information towards automation. The first iteration was based on Consent Codes [6] which provided concepts representing permission to use data. The second iteration adopted some terms from the Automatable Discovery and Access Matrix16
The Data Use Oversight System18
Other uses of DUO include specification of informed consent for health and genomics research in Africa [17], along with ADA-M for representing consent for health data sharing in a blockchain [12], and in CTRL [10] – an online platform that uses DUO to provide dynamic consent interfaces and tools for large-scale genomics research programs. Potential uses of DUO are described in the Data Tags Suite (DATS) [1] where DUO is a candidate vocabulary in its framework for discovering data access based on metadata, and as part of a roadmap for accessing 1 million human genomes across EU infrastructures [25]. We found only one article that provided a machine-readable metadata representation of information using DUO – which used SWRL19
Of note in these identified articles and other resources is that we did not find a clear example or workflow for how the machine-readability of DUO should be associated with datasets, expressed as part of a request, or how the matching algorithm should function. The article presenting DATS [1] also refers to this difficulty in establishing the permissions and prohibitions when using DUO, and mentions ODRL as an alternative model providing clearer expression of permissions and prohibitions. The DUOS framework offers the best (available) description of how DUO can be applied, but does not offer much guidance on how the matching is performed between datasets and requests annotated with DUO concepts. From these, we establish the necessity of providing RO1, RO2, and RO3.
Given that health data is personal data, it is subject to regulations such as the GDPR as well as other domain and sector-specific laws such as Health Insurance Portability and Accountability Act20
In this, the state of the art consists of substantial research and development in modelling and using legal ontologies (see survey by Rodrigues et al. [24]). Of note regarding the matching of DUO dataset annotations is the policy checking algorithm for GDPR developed by SPECIAL H2020 project [3] which offers a fast matching algorithm based on subsumption between OWL2 concepts with logical consistency and correctness guarantees. In principle, this is similar to DUOS’s matching algorithm where the concepts to be matched in a policy are pre-determined.
While ODRL, being a standard for expressing policies, provides concepts with legal interpretation (e.g. Asset or Party), it deviates from or does not contain terms such as Controller or Legal Basis which carry important obligations under regulations such as the GDPR. Vos et al. address this by extending ODRL as a ‘Regulatory Compliance Profile’ which is used for expressing policies associated with GDPR [27]. In this, the relevant concepts in ODRL are extended with those from GDPR to construct ODRL rules reflecting GDPR’s compliance requirements. In approaches providing a vocabulary for use regarding GDPR, GDPRtEXT [19] provides a vocabulary of concepts, and GConsent [18] provides an OWL2 modelling of consent information. While these approaches are illuminating in how to describe GDPR’s requirements, their use would restrict the created policies to be operationally limited for application under GDPR.
In contrast to these, the Data Privacy Vocabulary (DPV) [20] provides a taxonomy of concepts which can be used as jurisdiction-agnostic terms with an extension for specific concepts from GDPR21
DPV-GDPR: GDPR Extension for DPV
Two recent surveys provide an overview of existing efforts that have utilised semantic web technologies to address GDPR compliance. The first, by Kurteva et al. [15], describes the approaches associated with consent, and the second, by Esteves and Rodrigues-Doncel [8], analyses ontologies and policy languages for modelling information flows. Both highlight the variety of approaches available, and offer opinionated suggestions regarding the use of ODRL and DPV – which we have incorporated in our choice of implementations.22
It would be prudent to point out that while both authors of this paper are also authors on the cited surveys, the justification offered here is that these prior efforts provide clear evidence on the strengths of choices made in our implementations.
As presented in Section 2.1, DUO concepts are structured across three taxonomies with textual descriptions of the DULs they represent. The goals of this work, in terms of research objective RO1 is to analyse this implicit information and express it explicitly using ODRL, with the additional goal of keeping compatibility with existing uses and workflows that use DUO so as to not cause large disruptions to GA4GH’s current and future activities.
We consider DUO’s primary attractiveness to be the ease with which its concepts can be easily constructed from input mechanisms (such as a form) and simply ‘tagged’ onto a dataset as an annotation. In this, the textual clauses used to describe the concepts are based on well-defined clauses from consent forms18. The role of ODRL, therefore, is not to replace DUO, but to provide additional machine-readable information for each DUO concept that provides explicit conditions currently inherent in the textual clauses i.e. as conditions that can be checked, verified, and consumed in an automated manner to perform tasks associated with validation of dataset policies, querying to discover suitable datasets, and to aid in matching requests with available data and its usage conditions. We also considered the contextual cases for representing additional conditions or information requirements such as records-keeping by institutions or for legal compliance, for which ODRL is also suitable as highlighted in our motivation.
Our methodology in this was to first analysedeletedd DUO concepts and textual information to identify their relevant representation as ODRL concepts. We then constructed rules expressing identified conditions and expressed them using ODRL along with identifying three categories of ‘policies’ from GA4GH’s cases reflecting data usage, data request, and an agreement based on compatibility between the two. We then constructed a matching algorithm that utilised developed ODRL policies to compare a request policy with a dataset’s policy to determine compatibility and to create an agreement where both were found to be compatible.
Identifying ODRL equivalents for DUO concepts
For each concept in DUO, we first sought to identify the constraints or conditions by interpreting the textual description and identifying whether it related to a permission, prohibition, or obligation, and the specific context of how those are to be applied. In doing this, we observed duplicity and overlap between DUO’s data use permissions and modifiers as both contained purpose-based conditions without a clear distinction between their semantics and interpretation, and regarding permission or prohibition of that purpose as an indication of consent. For example,
We suggest restructuring the taxonomies in DUO to address this by considering a single purpose-based taxonomy specifying research concepts that either have variants for permission and prohibition (i.e. two distinct concepts), or to explicitly provide a data use modifier concept representing permission or prohibition that is applied over a specified research purpose. This is based on DUOS’s data collection input forms and ADA-M’s concepts where each research purpose can be individually consented (or restricted) to, with possible implications arising from lack of any permission or prohibition. For example, the DUO concept for code
After analysing DUO’s concepts and identifying inherent conditions, we formulated the relevant ODRL rules for expressing those conditions. Where this was not possible because of ODRL lacking the required concept, we created proposed extensions of its concepts to enable rule expressions. For each concept, we constructed an
We faced challenges in interpreting specific phrases such as “is limited to” which imply that usage is permitted only within that specific scope. If this interpretation is correct, then DUO should clarify how potential conflicts should be resolved, for example between rules expressing exclusive limitations and other permissive expressions (e.g. “is allowed for”). Our suggestion is to take advantage of ODRL’s ability to express these rules as code through which it can explicitly express the underlying concepts and how they are applied to create permissions, prohibitions, and obligations, and then using existing methods for ODRL [21,27] and OWL [3] to reason over them.
Currently, DUO concepts are limited to representing conditions for data use, with suggestions referring to external ontologies for additional concepts required for expressing scope or restrictions. For example,
Interpretation of conditions inherent in DUO concept descriptions as ODRL rules
(Continued)
For our implementation, we identified and collected such ‘missing terms’ into an ad-hoc vocabulary to permit ODRL rules to be expressed correctly for each DUO concept. We recommend DUO to adopt these or to create a similar vocabulary for explicitly providing the concepts and their descriptions separate from the data use conditions in which they are used. This also has the added advantage of providing better documentation of information represented by those concepts. For e.g. by modelling IRB as a concept representing Ethics Review Board approval, it is possible to add information about what processes and requirements are needed in such reviews. It also permits further rules pertaining to ethics approvals to be semantically associated with a base concept, e.g. to indicate it must be carried out prior to data use, or periodically, or before publishing any outcomes.
For data use requests (specified as investigations in DUO), we again found duplicity with concepts in data use permission and data modifiers. For example,
Apart from the expression of conditions for data use and requests to use that data, DUO concepts also have applications in recording the outcomes of matching processes where access has been granted. This is an important and yet unexplored area in the currently identified uses of DUO, especially since any sharing of data would be expected to be accompanied by information about the entities involved, provenance associated with the grant process, and details regarding how the conditions have been met at the time or later in the future. We present how ODRL is useful in representing this information as instances of
Each DUO restriction is represented as an instance of
Interpreting the textual descriptions accompanying each DUO concept, we used
It was challenging for us to construct a valid policy which required specifying the resource (dataset), because DUO concepts only represent abstract conditions that don’t relate to a specific dataset. DUO also does not specify how to indicate or identify values associated with conditions such as specific diseases or temporal duration. To ensure ODRL policies are always valid, and to clearly indicate how to later apply or instantiate them for a dataset, we created the class
Another challenge we faced was for indication of scoped restrictions e.g. specifying the location when use is limited to a geographic location. DUO contains the property
We discussed possible solutions to this, and identified four potential avenues: (i) use of OWL class expressions;24
A short and informative summary provided by Protégé

An
When using DUO concepts to annotate datasets, each dataset can contain multiple DUO concepts that must be interpreted in combination as an offer for using that dataset. This is expressed in ODRL as an instance of
The construction of the
For a given dataset, retrieve all DUO data use permissions and modifier concepts it was tagged with.
For each DUO concept retrieved, fetch its relevant
If a retrieved policy uses an instance of a
Create an instance of
Note: each rule is still associated with DUO concepts using
Add provenance information or other additional documentation, e.g.

An example
To represent data use requests, termed as investigations within DUO, instances of

An
Instances of

An
The following algorithm is used to create the
Retrieve the
Match the
Record the result where
If the matching result shows a compatibility between the request and the offer, then access is expressed as permissible by using a permission with a constraint on the requested purpose for access, as well as any other additional constraints, e.g., spatial, temporal, or duties on the
Provenance and other relevant information, e.g.,
In the matching algorithm, we only considered the case where a request is matched with a dataset’s offer. In a practical situation, there may be a single broad request that could have potential matches with several datasets, and it may be undesirable to run the matching against all possible combinations of requests and datasets. To select relevant datasets, a filtering mechanism can be used, such as based on the request’s specified purpose, and the dataset’s policies could be indexed in a database to enable efficient retrieval and matching. We note these as candidates for future improvements in the progression of this work.
Note that ODRL defines
The matching algorithm in DUO is based on comparing and identifying compatibility between a dataset’s data use conditions with data use requests. In our ODRL implementation, this is done by comparing the dataset’s odrl:Offer with an odrl:Request. Given two sets of concepts representing an offer and a request, the matching algorithm can utilise two different and incompatible notions for how access is determined. The first, which is the more common semantic interpretation, is based on considering classes as sets and determining access based on set membership. For a class P and its subclass C, a request for accessing P would also permit use of C since a member of C is always a member of P. But a request for C would not permit use of P as not all members of P are members of C. This approach has been used in matching policies for GDPR compliance [3] and for granting access to resources in Solid [7].
The second approach, which is what DUO describes in its documentation, is based on identifying applicability of a concept based on its specificity. For a class P and its subclass C, a request for accessing P would not grant access to C since it is more specific, but a request for accessing C would grant use of P as it is less specific. Using subsumption as a criterion, the first approach grants access when the data policy subsumes the request policy, whereas the second approach grants access when the request policy subsumes the data policy. Thus, both of the former mentioned approaches (i.e., [3] and [7]) can be reused here by reversing the direction of subsumption.
Another consideration for the matching algorithm is the resolution of permissions and prohibitions in terms of their order of evaluation and conflicts. It is possible to interpret a policy in several incompatible ways, such as first checking for permissions and granting access at the first satisfied permission, i.e., a permissive model, and its opposite where prohibitions are first checked and access is denied for first satisfied prohibition, i.e., a prohibitive model. When a conflict occurs for a permission and a prohibition over the same resource, the resolution would be based on the precedence of one over the other. In DUO, the matching algorithm is prohibitive since prohibitions take precedence over permissions. This means that if a request either does not satisfy a permission or satisfies a prohibition, the request is denied. The policies are considered compatible only when all permissions are satisfied and all prohibitions remain unsatisfied.
Based on these considerations, our matching algorithm consists of checking for subsumption or satisfiability between

Pseudo-code of the matching algorithm
Algorithm 1 provides a pseudo-code representing the steps to be performed for policy matching. Please note that the algorithm only represents a broad indication of actions and that the DUO documentation lacks specifics for correctly interpreting aspects of semantics. Given that this interpretation has a significant impact on the decision-making within DUO’s processes and that DUO only specifies interpretation of hierarchical concepts only for purposes but not others (e.g. location, users, projects) – we explicitly identify this as a topic that requires further consideration and investigation in terms of better understanding and expressing the interpretation of DUO’s conditions in approval decision-making processes. To remedy this lack of information, except for the purposes in DUO’s concepts, we followed existing implementations for legally relevant interpretation of hierarchical concepts [3] where a narrower concept or a sub-class cannot be considered compatible with a request for a broader concept or parent-class. For example, a permission for city as a location cannot be satisfied by a request for a region containing that city.
The algorithm reflects DUO’s prohibitive interpretation in matching where the offer’s prohibitions are checked and ensured to be satisfied before any permissions are checked. The prohibition checking will deny the request if any of the following constraints in the offer are incompatible with the request:
offer assignee matches26
Permissions and prohibitions for complex legal structures such as subsidiaries or group of companies cannot be accurately represented using equality (=) or subset (⊆) relations. We, therefore, use the equivalence relation (≡) to indicate the request entity should satisfy the legal interpretation of equality – defining which is outside the scope of this article.
offer has a spatial constraint matching or not satisfying (
request has a project matching (
there is a moratorium with a date in the future; and
request has a purpose matching (
If no prohibitions are found, the permissions are checked next. The permission checking will deny the request if any of the following constraints in the offer are incompatible with the request:
offer assignee does not match (≢) the assignee of the request; offer time limit on use has lapsed; offer has a group-related research purpose, e.g., offer purpose does not match (⊈) request purpose, e.g., DUO’s general research use purpose
These steps are checked for all prohibitions and permissions of the dataset’s offer and if all permissions and prohibitions are satisfied without violations, access to the dataset can be granted. The proof-of-concept demonstration described in Section 5 uses these steps to match an offer with a request policies.
Examples demonstrating the matching process
Table 2 presents examples of how the matching process works for permissions and prohibitions in offer with constraints for location and purpose. In a semantic web implementation, the processes for checking equivalence (≡), intersection (∩), and subset (⊆) require additional considerations beyond simply using
Therefore, an implementation of the matching process has to be cognisant of such cases and be careful when implementing the equivalence, intersection, and subset processes using conventional semantic web interpretations (e.g.
The DUO concepts and terms used are different from those as used in legal compliance tasks. By using ODRL concepts, the terms involved are expressed in a language that has legal interpretation (e.g. Asset or Party). The ODRL vocabulary also contains additional terms which may be used with DUO for specific legal interpretations, such as ConsentingParty, InformedParty, and obtainConsent. While these terms are sufficient for a policy to have legal interpretations, they are insufficient to incorporate the specifics of laws such as GDPR which assign specific roles to parties and require use of specific legal basis in processing of data. At the same time, if the terms are made specific only for a single law such as the GDPR, the usefulness and applicability of the resulting policies would be restricted to only that law without a clear recourse for adopting other laws and jurisdictions. To address this gap, we utilised the Data Privacy Vocabulary (DPV) which provides terms that are intended to be jurisdiction-agnostic and can be used without being restricted to a specific law.
To utilise DPV, we first performed an alignment between its concepts and ODRL where DPV concepts that have an overlap with ODRL concepts are defined as their subclasses (e.g.
We intentionally restricted the alignment to only concepts required for using DUO so as to not introduce additional external interpretations.
Alignment between DPV and ODRL for use in policies expressing DUO concepts
Using DPV enables modelling rules regarding restrictions on legal basis (e.g. consent), explicit acknowledgement of roles (e.g. data controllers), limitations on third-party recipients, and indicating the applicability of a specific law using
To explicitly specify GDPR as the applicable law and utilise its legal bases and rights, we utilised the DPV-GDPR28

Two
DUO states the interpretation and applicability of GDPR’s requirements is the responsibility of the adopter. This follows from the complexities of determining their applicability before any request is known, or because of the differences between stakeholder jurisdictions. To assist with this process, we recommend adding or providing relevant methods that are necessary to identify the applicability of the GDPR (or other laws). For example, GDPR is applicable (to simplify the condition) when an organisation operates within the EU or processes the personal data of people in the EU. This translates to knowing the locations of people whose data is being offered for use as well as the requesting entity location.
Using DPV, both of these can be expressed using the appropriate Entity concepts and dpv:hasLocation. This enables expressing using ODRL further data use limitations such as data being available only when the request acknowledges the applicability of the GDPR, or permitting use only within GDPR-governed jurisdictions, and checking these as permissions or prohibitions to be satisfied when matching a request with a dataset by using a matching algorithm as in Section 3.6 along with an encoding of GDPR’s requirements such as those from Vos et al. [27]. The DPV-LEGAL29
In this section, we describe the implementation of a User Interface to generate dataset policies and a prototype implementation of the matching algorithm is available at
Figure 1 shows two examples of the developed UI to edit
Ad-hoc vocabulary available at
Upon selecting the relevant DUO concept in the UI, the application retrieves the associated

Proof-of-concept implementation showing generation of
For the matching process, the conditions represented in an
The data discovery algorithm starts by checking if there is a specific rule within a dataset’s policy for the purposes stated in the
In the event additional duties are imposed for dataset use, such as agreeing to collaborate with the primary study investigator or providing documentation of ethical approval, these are included in the
To record the result of the matching algorithm, an
DUO represents one facet of GA4GH’s ambition to facilitate responsible genomics data sharing for health and medicine-related research. It plays an important part given that its role is to increase automation in data discovery and assist in ensuring data use is permitted with accountability and oversight. Its use is thus part of a workflow consisting of different components, processes, and stakeholders who have differing requirements for how they use DUO. Any changes proposed to the way in which DUO is modelled, is applied for dataset discovery, or is used in automation for identifying compatibility with requests may have consequences on these existing workflows. While better design and performance are valid technological goals, they should be evaluated within the lens of socio-technical applications they are a part of. This section therefore discusses the influence and impact of our work on existing DUO-based workflows and offers suggestions on how this work can be best utilised.
Design of DUO concepts
As we outlined in Section 3.1, the concepts within DUO have duplicity in semantics, and do not present the conditions they represent as explicit machine-readable code. This has an impact on the ability to use these for the expression of policies and the implementation of automation in dataset discovery and request matching processes, as well as the inability to further use this information in other processes such as to keep records and create documentation. In addition, the structuring of concepts requires clarity on their intended role without overlap (i.e. permissions, modifiers, and investigations), and should have separation of concerns (i.e. purposes from modifiers). Through this, the use of concepts becomes clearer and consistent, and provides the ability to introduce additional conditions and constraints without impact on existing concepts. We recommend following the ODRL model and concepts in terms of representing rules (permission, prohibition, duty), and constraints (purposes, scopes) separately from one another.
For further refinement of DUO terms and their interpretation, the textual descriptions provided should utilise controlled natural language (see survey on [14] for variety of approaches) that match the expression of rules (as in ODRL) so as to provide a reduced level of ambiguity and high-degree of specificity in the terms used. Through these, the descriptions can be made self-sufficient in terms of describing how they should be applied, or when (i.e. before or after data has been released), which can benefit the non-technical processes and stakeholders in understanding and using them. In addition, the specificity of descriptions will also assist approaches such as ours in constructing machine-readable rules that match the exact intention of that concept.
By specifying policies in ODRL (or other similar policy-based semantic models), DUO gains additional potential where policies may encompass other requirements (e.g. legal), or have information about the provenance of the data access committees and other relevant processes. This would aid in maintaining documentation, using validation and other forms of automation to ensure it is complete and correct, and perform follow-up actions periodically or as contextually required. In all of these, the benefits do not require everyone to adopt a large amount of technical debt, and adopters of DUO can choose the extent of what and how they wish to utilise our suggestions – such as adopting just the ODRL rules, or its matching algorithm, or also the connection to legal compliance using DPV. Our primary contribution is in demonstrating their usefulness and providing a path for their development and adoption.
Integration into existing implementations
We acknowledge that some of our proposed changes may break backwards or existing compatibility with DUO utilising systems, and therefore suggest any adopter to perform an assessment regarding whether the gains obtained from such changes outweigh the cost of making these changes. In our opinion, our changes do offer more advantages than disadvantages in the longer term, and therefore they should be adopted gradually if not immediately. We recommend the adoption of equivalent ODRL policies for DUO concepts and the (re-)structuring of existing taxonomies and concepts as the first steps. After this, systems such as DUOS can take advantage of the increased availability of machine-readable data to enhance their data discovery and matching algorithms.
We also acknowledge the value of DUO concepts in being simple for stakeholders to understand and utilise, and their basis in ‘textual clauses’ such as those offered in informed consent or data donation/release forms. With this in mind, our modelling of ODRL policies ensures that there is no immediate need to replace the use of DUO concepts since the ODRL policies are complementary to these i.e. the ODRL policy is linked to DUO concepts rather than replacing them entirely. Thus, stakeholders who lack or have limited technical expertise can continue to utilise DUO concepts as they have, with machine-based implementations taking advantage of the increased clarity and specificity of ODRL rules associated with those DUO concepts. An important advantage this provides, that is not possible in the current DUO-based implementations, is from the underlying constraints or conditions being made explicit, thereby providing a larger avenue for where further research into the use of automation and logic-based reasoning can be investigated to scale the approach to larger and more diverse use-cases than is currently feasible with DUO.
It also offers the possibility to encode as machine-readable metadata what is currently external information i.e. (i) who: the data is about, requested access, was granted access; or (ii) follow-up duties once data has been released: checking whether it has been fulfilled, documenting fulfilment or violation; (iii) legal obligations associated with data use. All these information and factors are what DUO-utilising systems currently utilise (such as DUOS) and will do so in any practical use-case in the future. By providing a clear path for adopters to express this information, the use of DUO can be made more systematic and consistent – thereby also increasing the potential cooperation between adopters and facilitating cross-boundary data requests and access as envisioned by GA4GH as well as the EU’s Health Data Space ambitions.
Assisting with legal compliance
Currently, DUO or GA4GH do not provide information on how the use of its efforts relates to legal interpretation and obligations, though they have ongoing discussions for the same. This is a particularly challenging task given the global scope of the work which encompasses different jurisdictions and their laws, and that laws such as GDPR are fairly recent in terms of how their obligations are understood to be applied. We suggest the use of domain-agnostic vocabularies such as ODRL and DPV to first provide a clear indication of how DUO and DUO-based systems relate to specific concepts within legal terminology. By using these within ODRL policies, DUO can provide what is effectively a digital contract.
Further specific jurisdictional applications can then be introduced as an extension of these. For example, the DPV-GDPR extension provides a convenient way to specify GDPR’s legal bases and rights alongside DPV. This reduces the burden on adopters who do not want to express this information or do not want to express any jurisdiction-specific information. For example, a data depositor who only stipulates use of data should be based on consent without explicitly defining the conditions for that valid consent can be expressed as a policy using ODRL and DPV. The oversight committee or an ethics board can then evaluate this further based on their knowledge of the valid consenting requirements, and add additional restrictions or obligations to follow a specific regulation such as the GDPR before permitting use of that data by using DPV-GDPR.
This freedom also offers benefits for systems like DUOS that can explicitly denote datasets as requiring GDPR-level consenting or its applicability by adding relevant metadata to the dataset policy. Doing so assists the matching process to also check for legal obligations and compatibility, such as by requiring specific information about the requester (e.g. a Data Protection Officer), or requiring additional legal bases and safeguards for transfer of that data (e.g. outside EU). Through this, DUO and its applications can gain a wider legal applicability across the globe and also have the means and mechanisms to address specific interpretations of the law. And given that all this information would be machine-readable and shareable with the dataset, it can be used by both provider and requesting entity for automation in identifying and checking the fulfilment of legal obligations based on utilising the existing state of the art.
Analysis of recent developments
Two articles [5,13] relevant to DUO were made available online during the reviewing of this article. To better position our work, we provide an informative preliminary analysis of their contributions and discuss the (continued) relevance of our work given these new developments.
Summary of new articles
The two articles [5,13] together represent improvements to the way information is expressed and encoded as rules based on DUO terms. The first [5] presents Common Conditions of Use Elements (CCE) – a controlled vocabulary representing concepts for use in data sharing policies. The second [13] presents Digital Use Conditions (DUC) – a policy expression mechanism to specify rules regarding conditions for sharing and reuse of datasets.
Where DUO terms singularly represent both concepts and rules, CCE and DUC distinguish between information and rules, which provides flexibility in their use, enables granularity in their respective uses, and provides a mechanism to extend them via profiles to suit specific use-cases and requirements. An online tool demonstrating this is provided at
The CCE vocabulary consists of 20 concepts that were identified from an analysis of requirements and conducted user studies. Its motivation is to provide “flexible ontologies that can capture complex and conditional permissions in data in a manner that enables logical computer-based reasoning” [5]. The article describes four requirements for CCE concepts:
atomic i.e. each term should represent a single concept as opposed to representing a complex or combination of several concepts;
no directionality i.e. the term by itself should not specify whether its usage means data reuse or sharing is allowed, forbidden, or obligatory;
generalised i.e. the term should be a modular category without any customisation, conditionality, or dependencies; and
the term should be “widely applicable and relevant”.
The DUC specification [13] defines the expression of policies where each policy contains an optional header section providing metadata regarding the policy and a necessary core section containing one or more statements. The header section also provides references to the datasets associated with the policy, and information on the interpretation of ‘unstated conditions’ as either ‘forbidden’ or as ‘permitted’. Each DUC statement contains four components:
a condition term, which is a CCE concept;
a rule, which is one of Obligatory, Permitted, Forbidden, and No Requirement;
a scope, which is either ‘Whole of asset’ by default or ‘Part of asset’; and
optionally a condition parameter with an optional value.
An example mentioned in the article [13] of a DUC statement is Country (condition) with Permitted (rule) for Whole of Asset (scope) with UK (parameter). Another example mentioned is Time limit (condition) with Obligatory (rule) for Whole of Asset (scope) with Month (parameter) as 12 (value). Additional examples are available within the online tool documentation.31
The article [5] provides a mapping between the 20 CCE concepts and DUO terms and indicates whether the CCE term is exactly equivalent to a DUO term, or requires additional use of rules (e.g. obligations using DUC) to be considered equivalent, or is a combination of multiple rules (e.g. permission with one concept and ‘forbidden’ with another) to match a DUO term, or has no corresponding term in DUO.
The advancements presented in [5,13] address similar motivations as those we discussed in Sections 1.2 and 2 regarding making information inherent within DUO terms explicit and machine-readable form. The key difference in the approaches is that while we focused on the reuse of existing approaches (ODRL as a standard for rules and DPV for vocabulary), the CCE/DUC approach creates a new vocabulary (CCE) and rule expression mechanism (DUC). In this section, we compare the two approaches based primarily on the distinctions between DUC and ODRL for expressing rules and policies, and between CCE and DPV for providing vocabularies.
Without a formal specification, it is difficult to compare DUC with ODRL. The way DUC is described and used in the examples and within the tool provides the perception that DUC can be a simplified subset of ODRL. This is because the structure of a DUC statement can be expressed in the form of RDF triples which can be grouped together within an ODRL rule. In this, the DUC concept is mapped to ODRL’s action, DUC rule to ODRL’s Rule, DUC scope to ODRL’s target, and DUC parameter and value as ODRL leftOperand and rightOperand, respectively. For example, the two examples of DUC statements stated earlier are equivalent to the ODRL rules presented in Listing 6.

ODRL Rules for DUC statements regarding country permission and time limits
In this mapping, DUC concepts being mapped to odrl:action is inaccurate regarding the context of information as some of these concepts do not represent actual actions. For example, while concepts such as Collaboration and Research Use can be considered actions, others such as Time Period and Regulatory Jurisdiction are not compatible with the definition of an action. This can be reconciled by treating these concepts as a constraint rather than action, or by treating all concepts as a rightOperand in a constraint with the leftOperand being their defining context, such as Purpose for Research Use.
The DUC rules map exactly to ODRL rules as follows: DUC Obligatory is an ODRL Obligation, DUC Permitted is an ODRL Permission, DUC Forbidden is an ODRL Prohibition, and DUC No Requirement does not have an ODRL equivalent. Of these, the mapping of ‘No Requirement’ is problematic since it is not possible to interpret “no requirement” in a deontic sense by itself. In DUC while the header can specify a default interpretation which is equivalent to an ODRL permission or prohibition, ODRL does not support such interpretation frameworks and the article also does not mention such use of the DUC header. Further, an example from policies specified in [5] mentions “Collaboration with No Requirement” with the interpretation that “The collaboration is evaluated when appropriate”. We find this interpretation to be unclear in terms of whether collaboration is permitted, or prohibited, or its interpretation is deferred and cannot be clearly stated. In the last case, it may be possible to express this in ODRL as a Permission with a Duty to obtain prior approval to express that collaboration is a possibility. In any case, we highlight the need to provide clarity regarding the implications of such rules as this is necessary in request matching algorithms.
The CCE vocabulary, which contains 20 concepts, specifies the atomicity of its concepts as a core requirement. However, we find that the concepts can be further generalised and structured into a taxonomy based on their implied context – such as DPV’s distinction between purposes, processing, or technical measures – as terms that have legal meaning. For example, we utilised the DPV concepts which match the terminology of regulations and GDPR by expressing DUO concepts as purposes, processing operations over data, location of operations, entities, and technical measures to safeguard data. With DPV we provided a rich taxonomy for each of these concepts, and also gained the ability to express jurisdictional concepts such as GDPR-defined explicit consent instead of just ‘consent’.
Mapping of CCE terms to DPV/ODRL concepts
In comparison, CCE concepts are still similar to DUO concepts in that they contain hidden implicit information (e.g. Clinical Care Use and Research Use are both Purposes), are not ‘complete’ in terms of balancing the concepts (e.g. Commercial Entity is defined but Non-Commercial Entity is not), and have not incorporated GDPR requirements sufficiently (e.g. CCE concepts do not address information security). In contrast, our use of ODRL and DPV addresses each of these limitations. For example, Section 4 shows the use of GDPR terminology within an ODRL policy. A mapping of CCE terms to DUO concepts is provided in [5].
Comparing it with our mapping of DUO concepts to DPV/ODRL concepts in Table 1, there are similarities in the approach and analysis in that both express DUO concepts as a combination of Concept + Rule. In continuation of this, we provide a mapping of DUC terms with relevant DPV/ODRL concepts in Table 4. In this, we focused on identifying the core DPV concepts for each CCE term and how it is applied with an ODRL rule (where relevant). From this, we can state that the work presented in this article is compatible with the development of CCE terms and that the use of DPV/ODRL terms to represent policies and matching processes based on DUO concepts is therefore also applicable to the use of CCE and DUC.
Given that ODRL is an established standard, is extensible through the profiles mechanism, and is also under active development – we posit that utilising ODRL and reorienting DUC as a “syntactic sugar” over ODRL can be a better alternative than the development and maintenance of a completely separate rules language. The benefits of basing DUC on ODRL also include access to ODRL’s distinction between policies representing offers, requests, and agreements – as demonstrated in Section 3 – which is not mentioned in either CCE/DUC articles. The use of ODRL as we have demonstrated in this article also provides the necessary clarity and alignment with legal compliance (via DPV) – which is necessary for health data reuse, and which the CCE/DUC approach only addresses at a superficial level. Further, we have demonstrated (in Section 3.6) how our approach leads to a matching process that supports all three stages of Offer, Request, and Agreement – in a manner that can be automated for checking completeness (e.g. using SHACL) and correctness (e.g. using reasoners). Neither of the CCE/DUC articles demonstrate how their developments improve (or even clarify) the matching process first mentioned by DUO. Finally, with CCE being the vocabulary for DUC concepts, we find it limited in comparison to DPV based on the lack of a structured taxonomy for organising the concepts, fewer terms, no representation of jurisdictional or GDPR-specific concepts, and lack of an extension mechanism.
Based on these, our preliminary analysis concludes that while the work presented in [5,13] is an advancement over DUO regarding separation of information from rule expression, we believe several limitations still exist regarding the modelling of this information. Specifically, the issues regarding how CCE concepts are structured and their limited vocabulary, the development of DUC as yet another rule language without compatibility with existing standards such as ODRL, and difficulty in extending this language beyond its current capabilities – such as to other health use-cases. We find the contributions provided in this article are still of relevance and have valid contributions that can be applied to further develop DUO, CCE, and DUC while maintaining the existing adoptions.
Conclusion
The Data Use Ontology (DUO) is an important initiative to enable wider data sharing towards the goal of progressing health and medical research. Its design and application are driven by the workflows and use-cases present in a socio-technical system consisting of a data repository, utilising a data access committee or approval board, and maintaining compatibility with textual clauses and machine-readable metadata.
We provide an argument for why the design of DUO concepts should be enhanced in terms of making its data use conditions explicit – also as machine-readable data and to utilise these in the matching of data use policies and requests. For this, we have demonstrated the applicability, suitability, and potential of ODRL as a standardised language to express all facets of DUO’s applications. We provide: (i) ODRL rules for each DUO concept; (ii) Integration of DUO concepts into an ODRL policy for a dataset; (iii) ODRL policy representing a data use request; and (iv) Demonstrating their use in checking for compatibility between dataset and request policies. Through these, we provide a better mechanism for the use of machine-readable information and its use in the automation of tasks regarding matching requests with offers and creating documentation as compared to the current DUO implementation.
In addition to the above, we also demonstrated how the use of DPV within ODRL policies enables connection with privacy and data protection laws without making it specific to a particular jurisdiction. For cases where a specific law is needed, the DPV concepts can be easily extended, which we showed for GDPR. Along with the descriptions of our research, we also provided links to resources and a demonstration of its implementation to assist adopters of DUO in assessing and using our work.
Importantly, rather than suggesting a radical new method of doing things, we started with the goal of constructing a mechanism that complements DUO rather than replacing it. As we’ve shown, using ODRL and DPV alongside DUO is feasible, and can be done with minimum disruption. Through this, we hope to have our work influence and improve existing DUO-related efforts, and in doing this to bring DUO and the GA4GH closer towards implementing the EU’s Health Data Space vision.
Footnotes
Acknowledgements
Both authors have contributed equally to this work.
