Sage Journals: Discover world-class research

Abstract

The Description Logic $SROIQ (D)$ , as the logical core of the W3C standard Web Ontology Language (OWL 2), is a widely used formalism for ontologies in the life sciences. Bio-health applications including health-care and life science domains commonly have a need to represent temporal information such as medication frequency or stage-based development. Different classes of temporal phenomena may generate different sorts of requirements on $SROIQ (D)$ or extensions of $SROIQ (D)$ . In this paper, we deliver the first precise investigation into identifying exactly what kinds of temporal requirements are most important for bio-health ontologies. We conduct an empirical investigation of the OBO Foundry using a bespoke methodological approach by searching each of its ontologies for specific temporal features and go on to calculate the importance of these features using a sophisticated set of measures. By doing so, we derive a formal set of Temporal Requirements that act as a set of guidelines which a language or logical extension to OWL 2 would need to satisfy in order to meet the temporal requirements of bio-health ontologies.

Keywords

Ontology bio-health temporal OWL description logics requirement

1. Introduction

The Web Ontology Language (OWL), as standardised by the World Wide Web Consortium (W3C) is a collection of knowledge representation languages designed for use in many application scenarios, providing the means to model information in a precise and structured way to enable the semantic web. An OWL Ontology is a set of axioms describing the classes and properties of a domain of interest. OWL 2 [8] is the current iteration (and successor) of OWL, and has two levels of expressivity: OWL 2 DL and OWL 2 Full, the former having a Description Logic (DL) as its logical basis. DLs [3,5,13] are decidable fragments of First Order Logic and have the ability to reason with information in a meaningful way. Two of the main aspects of DLs are to: (1) provide ways to model relations between three kinds of entities in the domain of interest, those being concept descriptions, roles and individuals names and (2) to build complex terms, usually called concept expressions, axioms and assertions and even knowledge bases (or ontologies). There are many varieties of DLs and they often differ by what constructors, axioms and operators are allowed, which in turn offers different levels of expressivity. The DL underlying OWL 2 DL is $SROIQ (D)$ [11]. Using DLs as the underlying formalism for OWL ontologies comes with many advantages. Due to precise syntax and semantics of DLs, they come with the ability to infer new information without having to state it explicitly. OWL (or DL) Reasoners are computational systems that can compute and infer new information from ontologies. Many reasoning services exist depending on what information needs to be deduced. Although many DLs such as $SROIQ (D)$ are of high complexity (N2ExpTime-Complete [12]), many optimisations have led to efficient implementations of reasoners that have become usable in practice. Lightweight DLs exist with lower complexity levels such as the DL $EL$ , which has polynomial time complexity whilst remaining expressive enough for many ontology applications [2,6].

The importance of ontologies has increased over the past decade, particularly with applications within the semantic web and life science domain. If we shift our attention solely on applications within life science, particularly those focused around the bio-health domain, we see a plethora of current ontologies serving different purposes, ranging from describing the development of biological entities, classification of diseases, anatomy descriptions, life cycle stage sequencing and many more. Take the OBO Foundry [19] as an example, an active ontology corpus which has been developed over the past 10 years, containing over 130 actively maintained bio-medical ontologies. The corpus contains ontologies such as the Drosophila Gross Anatomy Ontology [7] which describes the anatomy and development of the common fruit fly, as well as medical terminological systems such as the National Cancer Institute Thesaurus (NCIT) [18].

Many applications in life science often include concepts involving time. Take for example an ontology describing the development of some biological entity. Any development inherently involves time: statements made in the ontology could include descriptions of elements developing, an entity occurring during a particular time or an event occurring before, after or during another event. It is clear that time information would be essential in such examples. From a different viewpoint, for instance, in a clinical setting, other temporal information may be needed such as disease progression or medical frequency. Apparently, different application domains embed various types of temporal features.

As expressive as ontologies and their underlying DLs are, there are still limiting factors over what they can and cannot express. OWL 2 does offer a way to encode some temporal information, for example, through time stamping (data types), but offers no way to describe any real type of change since as it is still a static logic (being a fragment of First Order Logic). It could be beneficial to both ontology authors and users of ontologies to have some sense of time encoded into the underlying rationale, allowing better representation of temporal aspects and the ability to query knowledge in the past, present or future. Clearly, if temporal information is needed but cannot be represented, then it may be the case that many ontologies may be currently misrepresented, or at least OWL does not have the required expressivity to meet the temporal requirements of these ontologies. The temporal requirements of bio-health ontologies could range from the accurate modelling of a specific type of temporal entity, such as a biological entity developing through time, to the modelling of a suitable time-line for which the temporal entities could develop through. Currently, it is not clear exactly what kind of temporal expressivity is necessary to meet the temporal needs of bio-health ontologies, simply because the temporal requirements of these ontologies are rather diverse and not precisely described.

Many efforts have been made in an attempt to overcome the general problem. Temporal extensions to DLs have been given a lot of attention in recent years. Many proposals exist, ranging from: combining classical temporal logics such as ltl, ctl or ${CTL}^{*}$ with DLs such as $EL$ or $ALC$ [10,16] where the result can be seen as a two-dimensional Temporal Description Logic (TDL); adding temporal information by extending DLs with a concrete domain [4] to act as a temporal referencing scheme [17]; or even internalising temporal information by embedding it into standard OWL via means of temporal ontologies, for example, a Fluent Ontology [21], or a dedicated OWL Time Ontology [14] which has recently become a W3C recommendation.

Very few of these temporal extensions have been investigated for a specific application area, and those that have are not transferable to other applications. In recent years, research on two-dimensional TDLs has been focused solely on complexity results rather than capturing the needs of some temporal domain [16], similarly for DLs extended with concrete domains [15]. We believe this is because both have fascinating complexity results [10,15,16]: it is very easy for these logics to enter into the undecidability realm, which is undesirable for DLs and ontologies. It may be the case that some of the proposed extensions may, in fact, be suitable for modelling the temporal requirements of bio-health ontologies, but since the temporal requirements of bio-health ontologies are yet to be discovered, an evaluation of these logics has yet to be accomplished. If the requirements were known, we could evaluate the current proposals, to see which were most suited, and if none were, we could set out to define a new logic based on these requirements in an attempt to solve this problem.

In this paper, we provide a foundation for defining a suitable temporal extension to OWL, in particular, to cover the temporal requirements of bio-health ontologies. We produce an empirically validated set of temporal requirements based on a survey of an up to date and actively maintained corpus of bio-health ontologies: the OBO Foundry ontology repository corpus, alongside one of its popular upper level ontologies – the Relation Ontology [20]. We characterise the corpus with respect to a rich set of temporal features and survey their coverage and impact. We then compile a list of Temporal Requirement Sets, based on the weighted temporal features. These requirement sets can then be used as either an evaluation mechanism for existing temporal extensions to test their suitability or as a mechanism to drive the definitions of new temporal extensions.

The contributions of this paper are: (1) an encoding scheme used to annotate temporal aspects of the Relation Ontology, acting as a seed to our survey, (2) a generalisable entity importance measuring system, which can measure the importance of entities used throughout the temporally encoded Relation Ontology over a corpus of ontologies and (3) sets of empirically validated temporal requirements acting as guidelines to temporal extensions to OWL.

2. Temporal patterns in bio-health ontologies

The background and motivation of this paper are presented via examples of how temporal information is currently represented in bio-health ontologies. To be able to do so, we introduce several key biological notions and terms crucial to understand the presented examples. We also introduce key aspects that are relevant to our survey that go hand in hand with temporal modelling. From this point onwards we assume the reader to be familiar with OWL and have a basic understanding of Description Logics (DLs), including their syntax and semantics.

2.1. The OBO foundry

The OBO Foundry1

¹
http://www.obofoundry.org
[19], first founded in 2007 contains a corpus of ontologies in the biomedical domain. It originally included only 16 ontologies and is to this day a collaborative experiment to establish a set of standards for ontology development, for which they could be used as reference ontologies in the biomedical domain. The corpus now contains over 130 ontologies. The OBO Foundry is home to popular ontologies that range from describing anatomies and developments of organisms such as the Zebrafish, Xenopus, Cephalopod and Drosophila ontologies to those that describe cellular and molecular structures such as the Cell or Gene ontologies. As well as those ontologies that intend to describe some particular domain area, there are those that intend to act as a shared resource, or a formal structure, designed to act as a referencing scheme for domain ontologies to reuse or derive their terms. These ontologies are often referred to as upper level ontologies. Two very popular ontologies that are present in the OBO Foundry which fall under this category are the Basic Formal Ontology [9] and the Relation Ontology [20].

The basic formal ontology The Basic Formal Ontology (BFO) is a formal upper-level ontology based on tested conventions for ontology creation. The ontology is built upon a collection of sub-ontologies: the SNAP ontology and the SPAN ontology. The former defines entities known as continuants (or endurants) and the latter defines entities known as occurrents (or processes).

In general, continuants are known to be objects that endure or persist through time. They can undergo changes, inhere in objects, be physical objects themselves, but must persist during the times they exist. Examples of continuants are you, your clothes, a pen, a phone, etc. From a biological viewpoint, continuants could include cells, your heart, your blood, your blood type, etc. BFO divides continuants into three separate categories, namely: independent continuants, generically dependent continuants, and specifically dependent continuants. Independent continuants are those continuants that can stand alone and continue to persist, i.e., they do not rely solely on something else for their existence. Dependent continuants do rely on something else for their existence to persist. The difference between specifically dependent continuants and generically dependent continuants is that the former relies on exactly one independent continuant (its bearer) for its existence (and it will cease to exist once its bearer does), whereas the latter can have multiple bearers. An example of specifically dependent continuant is the shape of a ball (round). An example of a generically dependent continuant is an entry in a database (it relies on each value in the entry).

Occurrents, on the other hand, are disjoint from continuants. Occurrents are those entities that unfold through time in temporal phases. They are often referred to as events or processes. If a continuant were subject to an event occurring, such as a heart (the continuant) beating (the event), the occurrent would be the event itself. Therefore, occurrents are not physical objects themselves; they are the events that unfold around the objects, subject to time. The occurrent class is also partitioned into several subclasses, namely: process, process boundary, spatiotemporal region and temporal region. A process is an occurrent that has temporal parts and depends on some material entity for some time. For example, consider a person over the course of his life, starting in childhood and ending in late adulthood. The process experienced by this individual would have been the process of ageing, and it would depend on that person itself. Process boundaries are temporal parts of processes that themselves have no other temporal parts. The example given by BFO of a temporal boundary is “the boundary between the 2nd and 3rd year of your life”. Temporal regions are simply occurrents that have references to some notion of time (instances or intervals). Examples include the time right now, the range of time during when you were born until your eventual death, the time that covered the year 1990, etc. Finally, spatiotemporal regions are defined as occurrents that are part of space-time. Examples are the region occupied by the life of a biological entity and the region occupied by the development of a disease.

It is clear that both continuants and occurrents are objects that require time to be defined and understood. Many of the ontologies in the OBO Foundry have incorporated the BFO’s class hierarchies into their structures (adhering to OBO’s principles), inheriting their properties and definitions. Having a unified and well-defined structure leads to less ambiguity in their understanding and helps to make integration easier.

Fig. 1.
Left: an OWL model of a development fragment of the drosophila ontology. Right: a temporalised OWL model of the same development fragment. $\circ =$ element of the DL domain, $\leftarrow =$ develops from, ⇢ = part of, $⇠ ⇢ =$ identity relation, s = spermatid cyst, a = agglomeration, co = coalescence, cl = clew, o = onion and l = leafblade.

The relation ontology The Relation Ontology2 ²
Available for download at http://www.obofoundry.org/ontology/ro.html.
(RO) acts as a means for standardisation across ontologies in the OBO Foundry and the wider OBO library. Its main focus is the classification of relations between instances of classes that exist in the bio-medical domain, but more importantly, it covers relations used in OBO Foundry ontologies. First introduced in 2007, the ontology was host to only ten relations, including primitive biological relations such as part of, derives from and preceded by, where each was equipped with a precise definition to avoid any ambiguity of their correct usage. The current version of RO is now host to 497 relations (as of 5th December 2016), where similar levels of detail are used in the definitions for many of the relations. As well as modelling relations, it also comes equipped with a class hierarchy that intends to classify the domains and ranges of the relationships, most importantly, between continuants and occurrents. Specifically, it aligns these classes with those from BFO. As stated, many of the relations in RO come with definitions to avoid ambiguity in their meanings. Some also come with temporal additions in their definitions. Take for example the definition and additional clarificatory comments provided for the mereotopological relation part of:
“a core relation that holds between a part and its whole”

“Parthood requires the part and the whole to have compatible classes: only an occurrent can be part of an occurrent; only a process can be part of a process; only a continuant can be part of a continuant; only an independent continuant can be part of an independent continuant; only an immaterial entity can be part of an immaterial entity; only a specifically dependent continuant can be part of a specifically dependent continuant; only a generically dependent continuant can be part of a generically dependent continuant. (This list is not exhaustive.)”

“Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at certain times, but this is difficult to specify in OWL.”
The definitions are explained well enough for terms not to be taken ambiguously. More importantly, they give information on how they should be interpreted with respect to time (not only by what we can infer from the respective domain and range types) and also show the lack of temporal support from OWL itself.

RO relations cover the vast majority of pairings over the classes they define. For example, relational hierarchies present in RO cover relationships between independent continuants and processes, outlined in the hierarchy relation between structure and stage, which include relations such as existence starts during and existence ends during. Other branches of the hierarchy include relations between independent continuants and specifically dependent continuants such as the relation bearer of.

Both occurrents and continuants are crucial to the relations of RO, and thus to all of the ontologies in the OBO Foundry that use RO. As with the BFO, many terms in RO have temporal information present and require this information to be correctly interpreted.
2.2. Temporal modelling in the OBO foundry

We now present an example of temporal modelling present in an OBO Foundry ontology. The example will use relations from RO and entities that correspond to those described in BFO and will illustrate the temporal weakness of OWL and show support for our survey.

The Drosophila Gross Anatomy Ontology describes the anatomy and developmental stages of the life cycle of the Drosophila melanogaster (the common fruit fly). We present a small fragment of the ontology describing the development of the spermatid cell, a part of the male germline cell of the fly itself. The fragment shows temporal patterns through two of its most used properties; develops from and part of, and can be broken down between 4 stages shown in the following axioms: $\begin{array}{l} Leafblade S ⊑ \exists dF . Onion S \\ Onion S ⊑ \exists dF . Clew S \\ Clew S ⊑ \exists dF . Agglomeration S \\ Agglomeration S ⊑ \exists dF . Coalescence S \\ Leafblade S ⊑ S, Onion S ⊑ S \\ Clew S ⊑ S, Agglomeration S ⊑ S \\ Coalescence S ⊑ S \\ S ⊑ \exists partOf . S Cyst \\ (S = Spermatid, dF = developsFrom) \end{array}$ The first nine axioms express a Spermatid cell going through 5 stages of development (for now we will assume that this short example encodes the entire developmental pattern and nothing occurs before or after the first and last stage). The tenth and final axiom expresses that every Spermatid is part of a Spermatid Cyst. We choose to interpret the identity of the Spermatid cell as the same cell over each developmental stage. Of course, each cell is a distinct element, representing a changed version of its predecessor continuously developing its morphology over time, but when a Coalescence Spermatid develops from an Agglomeration Spermatid, the Agglomeration Spermatid ceases to exist as an entity. In this example at least, we take develops from to represent a specific type of change, which is also apparent in the definition of develops from. Again, specific to this example (and others), the develops from relation could also be seen to describe both pre and post-conditions of elements’ development. For example, in the first axiom, the class Agglomeration Spermatid could describe the precondition and the class Coalescence Spermatid could describe the post-condition of the same element developing. Finally, since the same Spermatid is continuously changing, then each type of Spermatid should belong to the same Spermatid Cyst during its development.

We identify two major temporal aspects of this development sequence. The first is that there is a single entity developing (the spermatid – a continuant) and the second is that there is a continuous partonomy between the two entities (the other element being the spermatid cyst – also a continuant) whilst they are developing. Due to the way the ontology is modelled, none of these temporal constraints can truly be enforced in OWL. Consider Fig. 1. The use of the existential restriction ‘∃’ in the axioms may refer to distinct elements for each possible Spermatid, immediately losing any possible identity constraints. This could lead to problems involving errors in the duplication of properties. For example, the Spermatid could have constraints on it itself, and thus each Spermatid in the example model would also be subject to these constraints. Then, if a change was to occur in one Spermatid, it would not necessarily appear in another Spermatid since they could all be distinct. A knock-on effect is that Spermatid Cysts that the Spermatids are part of do not have to be the same Spermatid Cyst, which can again lead to similar problems. In an ideal setting, the identity between the Spermatids must be maintained, as should the partonomy between the same elements. A more faithful model is also presented in Fig. 1. In this model, we imagine OWL to have an embedded time-line, where we can view normal OWL worlds (or models) at different time points, like the two-dimensional semantics seen in ${LTL}_{DL}$ combinations such as ${LTL}_{ALC}$ [10,16]. They are called two dimensional since they extend the standard DL domain (the first dimension) with a time-line (the second dimension), and models can be viewed as sequences of standard DL models, that can share the same domain. We adopt a similar approach as it suits this example well. In this temporal setting, there are 5 OWL worlds that are set along a time-line, and each world shares the same 2 domain elements which represent the Spermatid and the Spermatid Cyst. At each time the Spermatid element belongs to a different Spermatid class, for example, at time t the element is an instance of Agglomeration Spermatid class and at $t + 1$ it is an instance of Coalescence Spermatid Class. During each time point, the domain element has a part of relation to a Spermatid Cyst, which is the same Spermatid Cyst throughout the development. Such a model seems to capture more faithfully what was intended for the biological modelling, yet this type of modelling is beyond OWL. There is only one single world of evaluation, no time-line, and no identity constraints between distinct entities.

This example shows yet another clear-cut case of OWL’s lack of temporal expressivity, and more importantly shows a significant amount of temporal information loss for only two relations and a small number of axioms. The motivation of this paper is driven by examples such as these; develops from and part of alone seem to be important relations for the Drosophila Ontology. Together, they are roughly used in one-third of the total logical axioms in the ontology, which could imply that one-third of the ontology is unfaithfully modelled. It would also be useful to know how often they are used in other bio-health ontologies. If they are only used in the Drosophila Ontology and no other, then it would be an over statement to say that both of the relations were of crucial importance to the temporal modelling of bio-health ontologies. Yet, if they were also used in one-third of axioms in all bio-health ontologies, it would not be unfair to say they were important relations. It would also not be unfair to state that, for example, independent continuants were important for modelling in bio-health ontologies, since the domain and range of develops from are restricted to this specific class, which would mean that one-third of the axioms in those ontologies require independent continuants.

The relations develops from and part of encode specific temporal information: develops from relates entities over two different time points (a past time relation), whereas part of relates entities in a single time point (a same time relation). Moreover, develops from relates two independent continuants, whereas part of can be used for continuants or occurrents, provided both types are compatible. We call these attributes of relations temporal attributes. Using the same reasoning as above, all of these attributes could be seen as important for temporal modelling of bio-health ontologies. If there was another relation in the Drosophila ontology that had the same temporal attributes as develops from that was also considered important, then it would make sense to also focus on the importance of the attributes themselves rather than just the individual relations.

Our survey intends to empirically and systematically rank the importance of these types of temporal features. We propose to annotate all relations in RO that are used across The OBO Foundry with their temporal attributes and then use carefully designed metrics to define their importance using their logical axiom counts and more. Such analysis will give rise to a set of temporal requirements of those bio-health ontologies.

We now go on to explain how the temporal attributes are derived and present the definitions of the metrics used to define importance.

3. Materials & methods

In the following, we distinguish three types of temporal features: (1) Temporal relations are those RO relationships that encode information that is temporally relevant; (2) Temporal attributes are types of temporal information that represent temporal phenomena described by temporal relations, and (3) Temporal annotations are sets of temporal attributes used to annotate a temporal relation with its relevant temporal information. (2) and (3) are defined in detail in the following section.

A temporal requirement corresponds to a temporal annotation. For example, if annotation A is used in an axiom of an ontology, A is said to be a temporal requirement of that ontology. Lastly, a temporal requirement set is a set of temporal requirements, typically one where the temporal requirements are likely to co-occur, defined in more detail in the following.

3.1. Overview

The goal of our study of temporal requirements of bio-health ontologies is two-fold. First, we will study the importance of temporal features across OBO Foundry ontologies. Second, we will suggest an empirically validated, ordered list of temporal requirement sets. In order to achieve our goal, we:

Define a set of temporal attributes based on relations from the RO that are used across the OBO Foundry.

Match axioms across the OBO Foundry ontologies which exhibit these attributes using a smart matching technique.

Analyse the resulting data with respect to the importance of these attributes and their corresponding temporal annotations.

Derive a ranked list of temporal requirements based on the importance, coverage and necessity score of temporal annotations across the OBO Foundry corpus.

3.2. Defining and identifying temporal attributes

We use the relationships defined in the relation ontology (RO) as a source for defining and extracting temporal attributes. We define temporal attributes as types of temporal information that represent temporal phenomena described by RO relations, such as the past time relation phenomena found in the develops from relation. For each relationship, the temporal information is gathered from its definitions or other annotations, its domain and range constraints, related relationships due to OWL’s precise semantics and in some circumstances general biological knowledge and the way in which ontologies use the relationship when the first three may be lacking.

To illustrate this procedure, recall the RO relationship part of. As well as the annotations (including definitions) presented in Section 2.1, take as well the annotation

“axiom holds for all times”

As an example, consider the axiom

\begin{matrix} Nucleus ⊑ \exists partOf . Cell \end{matrix}

which states that every Nucleus is part of some Cell. In this instance, the annotation would be interpreted as “ At any time t, for any instance n of Nucleus at time t, there exists an instance c of Cell at time t, such that n is part of c at time t”. By parsing the definitions and annotations of the part of relation, we can extract the following temporal information: (i) partonomy relationships take place during single time points, i.e. they are same-time relations, (ii) the classes must be compatible, (iii) partonomy will hold eternally true (when the elements exist) and (iv) the partonomy may hold between the same elements over time. (iv) is also derived from the fact that in many temporal modelling scenarios, it may be important that the same elements are related over time. For example, if a particular cell were to have a nucleus as a part at some time point, it would not make sense for this cell to have another nucleus at another time point in usual cell development patterns (this is often referred to as a rigid relation in the temporal logic realm). Each temporal feature (i)–(iv) is then categorised into the following respective temporal attributes (i) Time:same indicating the relation takes place over a single time point, (ii) Domain:X-Range:X indicating the domain and range must share the same type X (where X is either a type of continuant or a type of occurrent), (iii) AHFAT (Axiom Holds For All Times) and (iv) Rigid indicating the relations follow a rigid like pattern.

We performed this temporal attribute derivation procedure for every RO relationship used amongst ontologies in the OBO Foundry. We acquired 56 distinct temporal attributes which we categorised into the following 6 sets: (1) Domain & Range , (2) Time , (3) States , (4) Identity , (5) Rigid , (6) AHFAT .

Domain & range contains the set of all pairings of domain and range constraints that occurred in RO relationships. The set contains 23 attributes involving the four types of continuants continuants (C), independent continuants (IC), specifically dependent continuants (SDC) and generically dependent continuants (GDC), general occurrents (O) and processes (P). Eight of the attributes are between different types of continuants and occurrents (e.g., Domain:C-Range:O or Domain:O-Range:C), 11 are between only continuants (e.g., Domain:IC-Range:IC), two are between only occurrents (e.g., Domain:O-Range:O), one was between any element and a continuant (e.g., Domain:X-Range:C, where X is a place holder any element type) and one was between any two elements of the same kind (e.g., Domain:X-Range:X).

Time contains attributes describing how each relationship relates its entities in time. Due to the fundamental temporal differences between continuants and occurrents, the set can be partitioned into three subsets, those being time attributes of relations between two continuants, two occurrents, or between continuants and occurrents. Overall this set consists of 19 attributes. The continuant time attributes account for seven of these, consisting of Time:same, Time:diff, Time:past, Time:pastImmediate, Time:same/past, Time:future and Time:same/future. Time:same indicates that the domain element of a relationship is related to the range element at the same moment in time. Time:past indicates that the domain element of a relationship is related to the range element present at a past moment in time. Time:pastImmediate indicates that the domain element of a relationship is related to the range present at the previous moment in time. Time:same/past indicates that the domain element of a relationship is related to the range element present at either a previous moment in time or the same moment in time and so on. Time:diff is the opposite of Time:same, indicating that the domain and range element are in different time points. The occurrent time attributes adopt Allen’s time relations on intervals [1]. 13 attributes make up this sub group consisting of Time:before, Time:before/during, Time:beforeInverse, Time:during, Time:during/overlaps, Time:during/overlapsInverse, Time:finishes, Time:finishesInverse, Time:isEqualTo, Time:meets and Time:meetsInverse. Time:before indicates that the domain element of the relationship happens entirely before the range element, where the before is to be interpreted as Allen’s interval relations intends, i.e., the domain ends before the range starts. Time:during/overlaps indicates that the domain element either happens during the range element or overlaps the range element, and so on. Relations between continuants and occurrents are simply a subset of those between continuants. The set consists of the following four attributes: Time:same, Time:same/future, Time:future and Time:same/past, interpreted in the obvious way.

States contain attributes describing possible state changes of the domain or range of a relationship. Six attributes are contained within this category. Domain related attributes include Domain:Birth, Domain:Changed, Domain:Death, and range related attributes include Range:Birth, Range:Changed and Domain:Birth indicates that the relationship specifies the start of the domain element’s existence. Domain:Changed indicates that the domain element goes through some type of change (such as a change in class or other properties) compared to what it was previously. Domain:Death indicates that the relationship specifies the end of the domain elements existence. The same holds for the Range:X attributes in relation to the range elements.

Identity consists of only a single attribute Identity:same which indicates that both the domain and range element of the relationship share the same identity, i.e., they represent the same temporal entity.

Rigid consists of only a single attribute Rigid which indicates that the relationship follows one of a rigid pattern, where both the domain and range elements of the relationship are required to be consecutively related through time for some required duration.

AHFAT consists of only a single attribute AHFAT which indicates that the relationship’s domain element is required to have a relation to a compatible range element at all times (during its existence).

Each attribute may also be paired with a tag Necessary:No which indicates that it is not necessary for the corresponding relationship to hold that particular attribute, although in some scenarios it can. For example, the attribute Rigid-Necessary:No is interpreted as “it is not necessary in all cases for the relation R to be interpreted rigidly, but in some cases, a rigid interpretation holds for R”. An example of when this may be the case is where an ontology specifically describes atemporal information.

Fig. 2.

Hierarchies of temporal attributes grouped by their category and ordered based upon a subsumption relation. C = continuant, IC = independent continuant, SDC = specifically dependent continuant, GDC = generically dependent continuant, O = occurrent and P = process.

Hierarchical relationships exist between many of the temporal attributes, since some of the attributes imply others in a way that is similar to OWL’s subClassOf relation. For example, Time:past implies Time:diff since a past relation is a relation between two different time points. Figure 2 shows how each attribute type is positioned in its corresponding hierarchy. The Domain & Range attributes are ordered depending on their ontological constraints according to the RO class hierarchy. The remaining attributes are ordered based on their inherent implications.

Temporal attribute examples To further demonstrate the meaning of several temporal attributes, we present examples illustrating their usage. Since we cannot provide examples for all attributes due to space considerations, the attributes chosen for demonstration are a representative set of all attributes and will provide a sufficient level of knowledge to determine the remaining attributes. As with the developmental sequence example from Fig. 1, the examples imagine OWL to have an embedded time-line, where we view a distinct OWL world at every point on the time-line. Also, when necessary, OWL axioms are used to describe examples and are displayed in DL syntax. We begin by describing the different types of entities (continuants and occurrents) before moving onto the temporal attributes of relations.

Fig. 3.

An independent continuant, persisting through time.

Figure 3 displays an independent continuant persisting through time. It exists alone, without being dependant on another entity, displayed by the fact that no other elements exist in each world. It also maintains its identity throughout time, displayed by having the same element in each world.

Figure 4 shows an example of a specifically dependent continuant SDC, and an independent continuant IC, existing at times t and $t + 1$ . The dependency is presented using the inheresIn relation which is defined as: “a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence”. Such a relationship is usually represented using the OWL axiom $SDC ⊑ \exists inheresIn . IC$ , however when considering the temporal aspects of continuants and the relation inheresIn, more constraints are necessary. We identify 4 temporal attributes for the inheresIn relation which are Domain:SDC-Range:IC, Time:Same, Rigid and AHFAT. The first attribute is used to simply specify the domain and range constraints of the relation which are present in the relation’s definition and its logical constraints in RO. The second attribute is used to state that the relation holds between elements at a single point in time, illustrated in Fig. 4 by the fact the inheresIn relation connects the elements SDC and IC in a single world at a time. This information was gathered from an annotation on the relation itself: “axiom holds for all times” which specifies that the relation holds between the two elements at the same time, and also by observing its usage in ontologies in the bio-health community. The third attribute is used to show that the relation could be rigid between its elements, i.e., if the relation holds for multiple time points, it must be between the same elements unless otherwise specified. This attribute was inferred in the same way as the previous attribute: through one of the relation’s annotations and usage throughout ontologies. Finally, the forth attribute states that SDCs must always inhere in some IC, which was also extracted from the annotation which states “axiom holds for all time”. This is illustrated by the fact when SDC is in existence (at times t and $t + 1$ ) it always has an inheresIn relation to IC. Regarding the existence constraints on continuants, when IC ceases to exist at time ( $t + 2$ ), SDC should also cease to exist, since its existence was dependant on IC, which is displayed in Fig. 4 by the disappearance of both elements.

Fig. 4.

A specifically dependent continuant, persisting through time and depending on another continuant for its existence.

Fig. 5.

A process, having different temporal parts over time whilst occurring in a material entity.

Figure 5 demonstrates a relation between a process and its temporal parts, and their dependency on a continuant for their existence. The main process $P 1$ , has 3 distinct temporal parts ( $P 2$ – $P 4$ ) at times t, $t + 1$ and $t + 2$ , all of which are related via the $partOf$ relation. Notice that each temporal part of the process only exists in a single world, whilst the main process exists throughout, demonstrating the temporal phases of the process. Since processes rely on a material entity for their existence, the main process is related to an independent continuant during its temporal phases via the occursIn relation, used in RO to relate a process and an independent continuant to express their dependencies and spatio-temporal properties. In OWL, this knowledge is usually represented using the following axioms:

$P 1 ⊑ \exists occursIn . IC$

$P 2 ⊑ \exists partOf . P 1$

$P 3 ⊑ \exists partOf . P 1$

$P 4 ⊑ \exists partOf . P 1$

where

P 1

is the main process that occurs in some continuant C, and each

P 2

–

P 4

is a temporal part of

P 1

. The occursIn relation is defined as “a relation between a process and an independent continuant, in which the process takes place entirely within the independent continuant”. We tagged the relation with the temporal attributes Domain:P-Range:IC, Time:Same and Rigid. The first attribute is used to show that the domain of the relation is a process (P) and the range is an independent continuant (IC), extracted from the definition of the relation itself, and confirmed through its restrictions in RO. The second attribute is used to describe the fact the relation is between two entities (

P 1

and

IC

) at single points in time. This is derived from its usage in ontologies, and its formal definition in RO which specifies the exact time point where the relation holds. The final attribute is used to show the relation needs to be rigid, i.e., if the relation holds for multiple time points, then the same elements must be related throughout these time points. This attribute was again extracted from information in the relation’s annotations; the process needs to occur entirely within the same continuant whilst it is unfolding through time. Regarding the partOf relation (described previously in Section 3.2), the Rigid property is not necessary in this example since each partOf relation only lasts for a single time point, however, if one of

P 1

’s parts lasted for more than one time point, it would be necessary to enforce that it remains part of

P 1

throughout these time points, and not some other process.

Fig. 6.

An independent continuant being derived from another independent continuant at the previous time point.

Figure 6 illustrates the derivesFrom relation. It is defined in RO as: “a relation between two distinct material entities, the new entity and the old entity, in which the new entity begins to exist when the old entity ceases to exist, and the new entity inherits the significant portion of the matter of the old entity”. The relation is tagged with the temporal attributes Domain:IC-Range:IC, Time:Past, Domain:Birth and Range:Death. The first attribute is used since the definition describes both the domain and range of the relation as a material entity, which is a subclass of the Independent Continuant class in RO. The second attribute is used since the relation relates two entities at two separate time points, specifically a present and past time point, which is directional from the former to the latter, hence the usage of the Time:Past attribute. This is displayed in the direction of the derivesFrom arrow in Fig. 6. This information was extracted from the relations definition which implies the domain element exists after the range element ceases to exist, and that the two entities do not exist at the same time and therefore cannot be related at the same point in time. The third and forth attributes were again extracted from the relation’s definition and are used to show that the domain element, $IC 1$ , comes into existence (it is born: Birth) when the range element $IC 2$ ceases to exist (Death). This is shown in Fig. 6 where $IC 2$ is no longer present at time $t + 1$ , and conversely, the same holds for $IC 1$ at time t.

Fig. 7.

An occurrent that starts with another occurrent.

Figure 7 demonstrates temporal relations between occurrents. The relation startsWith is used where O1 startsWith O2, which would be expressed using the OWL axiom $O 1 ⊑ \exists startsWith . O 2$ . startsWith is defined in RO as “x starts with y if and only if x has part y and the time point at which x starts is equivalent to the time point at which y starts. Formally: $α (y) = α (x) \land ω (y) < ω (x)$ , where α is a function that maps a process to a start point, and ω is a function that maps a process to an end point.” This relation is annotated with the temporal attributes Domain:O-Range:O and Time:StartsInverse extracted directly form the relation’s definition and constraints in RO. The first was used since both elements of the relation are defined as being occurrents (O), and the second attribute is used since the definition of the relation is intended to describe Allen’s ${starts}^{'}$ . The time point at which the $o 1$ starts must be the same as O2’s start point, and O2 must end before the time O1 ends. This is displayed in Fig. 7 since both occurrents come into existence at time $t + 1$ and $o 2$ ends at time $t + 1$ , before $O 1$ ends, illustrated by their appearance and disappearance in each world.

Fig. 8.

An independent continuant which is an immediate transformation of another independent continuant.

Figure 8 demonstrates the immediateTransformationOf temporal relation between two independent continuants. The relation is defined as “x immediate transformation of y iff x immediately succeeds y temporally at a time boundary t, and all of the matter present in x at t is present in y at t, and all the matter in y at t is present in x at t” and can be used in OWL as follows:

$IC 1 ⊑ \exists immediateTransformationOf. IC 2$ .

This relation is annotated with the temporal attributes Dom:IC-Ran:IC, Time:PastImmediate, Identity:Same and Dom:Changed. The first attribute is based on domain and range constraints extracted from RO. The second attribute was extracted from the relation’s definition and indicates that the domain element of the relationship is related to the range present at the previous moment in time. The third attribute was also extracted from the definition and indicates that both the domain and range element are in fact the same entity, derived from the statement that they share exactly the same matter, i.e., the same entity instantiates different classes over time. This is illustrated in Fig. 8 by having the same single element

IC 1

present at each time point, but being an instance of different classes at time t and

t + 1

when the relation takes place, indicated by a darker shade simulating a change in class.

3.3. Temporal annotations

With the resulting temporal attributes, we developed a coding scheme to then annotate each RO relationship with what we call a temporal annotation which consists of its temporal attributes, defined as follows:

Definition 1 (Temporal Annotation).

Let R be a relation from RO, and $Y = {Domain & Range, Time, States, Identity, Rigid, AHFAT}$ be the sets of temporal attributes described above. A temporal annotation for R is a set $A \subset ⋃ Y$ where A contains

a single domain and range attribute

0 or 1 identity attributes

a single time attribute

0 or 1 rigidity attribute

1 or more state attributes

0 or 1 AHFAT attributes

To allow for full comparisons of temporal attributes and annotations, we also include the upward closure of attributes for a given annotation according to the temporal attribute hierarchies in Fig. 2, in what we call a temporal inferred annotation, defined as follows:

Definition 2.
Let R be a relation from RO with an existing temporal annotation A. Let $(Y, ⩽)$ be the poset shown in Fig. 2. The temporal inferred annotation for R, represented as the closure $cl$ of A, is defined as follows: $\begin{matrix} cl (A) = {y ∣ \exists x : x \in A \land x ⩽ y} \end{matrix}$

The Necessary:No (Nec:No) tags do not necessarily have to appear on the inferred attributes. As an example, the temporal annotation $A_{1}$ for part of is {Domain:X-Range:X, Time:same, AHFAT, Rigid-Nec:No}. Its temporal inferred annotation $A_{1}^{I}$ is equal to $A_{1}$ . The temporal annotation $A_{2}$ for develops from is {Domain:IC-Range:IC, Time:past, Identity:Same-Nec:No, Domain:Birth-Nec:No, Domain:Changed}. Its temporal inferred annotation $A_{2}^{I}$ is defined as $A_{2}^{I} = A_{2} \cup {Domain : IC -Range : C, Domain : C -Range : IC, Domain : C -Range : C, Time : diff}$ .
3.4. Matching temporal features across OBO foundry ontologies

Although the rules of the OBO Foundry enforce that terms, such as relationships, be used consistently throughout (at least) OBO Foundry ontologies, there are instances where this is not the case. Ideally, to check for a relationship’s usage in an ontology, one should be able to simply search the ontology’s signature for an occurrence of the relationship’s IRI. However, this relies heavily on ontology developers correctly using terms from other vocabularies, i.e. importing vocabularies. This is often not the case since importing ontologies could result in negative side effects such as size increase or a jump in complexity. In the RO case, this matter is immediately realised. Its expressivity is very high due to its complex modelling of relations (role hierarchies, role chains, size, etc) and importing the RO will most likely have a direct negative effect on performance and reasoning time. If not importing the ontology, then at the least the same IRI of any relation used should be adopted in order to indicate the intention that the relationship is the same relationship from RO. Unfortunately, this is not always the case. Instead, developers may (and do) create their own entity with a similar name. For this reason, we cannot simply rely on checking for exactly matching IRIs in an ontology’s signature. Therefore, we adopt a smart matching approach, where we define that a relationship outside RO smartly matches a RO relation if either they share the same IRI, name (rdfs:label), alternative term (IAO_0000118), OBO foundry unique label (IAO_0000589) or the same exact synonym (hasExactSynonym) to avoid any potential misses. These annotation properties were chosen due to the information encoded in each: they are clear, unambiguous in their meaning and ontologies that define their own relationship would be likely to use values from these annotations. Manual inspection of the annotation properties’ values and self-defined relations in the RO confirm this. Exact matches occur when a relationship inside an OBO ontology has the same IRI of a relation from RO (i.e., exact matches refer to the correct usage of RO relations in external ontologies, as specified by the OBO Foundry’s rules).

3.5. Usage of temporal features

We present a notion of usage that defines if and how an ontology in OBO uses a temporal attribute, annotation or relationship from the relation ontology.

When considering usage throughout the corpus, we shift our attention towards the terminological aspects of the ontologies in the corpus. That is, we choose to investigate the explicitly asserted terminological knowledge, specifically TBox axioms. Our notion of usage is defined as follows:

Definition 3.
Let f be a temporal attribute, F a temporal annotation, P an RO relationship, $O$ an ontology occurring in the OBO Foundry and let α be a terminological axiom in $O$ . We say that
Fusesf if $f \in F$

PusesF if P is annotated with F

αusesP if P occurs in α

$O$ usesP if P occurs in $O$
where uses is transitive.

3.6. Analysing the importance of temporal features

Our goal is to determine the importance of temporal features, i.e., attributes, relations and annotations.

Although temporal relations are annotated with temporal annotations, which are in turn made up of temporal attributes, we choose to initially focus on all three features individually since they all produce different analyses for different audiences. For example, analysing temporal relations could benefit ontology authors as they could determine on a high level, which relations were considered most important, independent of what temporal attributes they are made up of. On the contrary, analysing individual temporal attributes could be useful for logic developers in determining what different types of modelling features are required for a logic, and more importantly, the importance of how attributes co-occur in annotations to determine what combinations are logically possible.

To date, no agreed-upon measure exists to quantify the importance of a particular entity $E$ , such as a relation or a class, neither in the context of a single ontology nor across an entire corpus. Entities in an ontology can be used in a variety of ways: they can be used to define the logical content of an ontology, for example in the definition of classes or other logical axioms, or even non-logical expressions such as annotations. As we are interested in determining the requirements for temporal extensions to a knowledge representation formalism, we care only about how entities are used across logical axioms (Definition 3). Whilst temporal modelling intentions could be captured in non-logical content, for example, in annotations, it could not be easily extracted. Parsing annotations for temporal content amongst a corpus of ontologies would require complex natural language techniques and would be out of the scope of an automated systematic survey.

To quantify the importance of a particular temporal feature, we decided to rely on coverage and axiom usage, which we refer to as impact for brevity. We define both metrics for temporal features as follows:

Definition 4.
Let e be either an attribute or annotation and $C$ be a set of ontologies. $\begin{matrix} Coverage (e) = \frac{| {O \in C ∣ O uses e} |}{| C |} \\ Impact (e) = \frac{\sum_{O \in C} (\frac{| {α \in O ∣ α uses e} |}{| {α \in O} |})}{| C |} \end{matrix}$

The coverage measures how many ontologies each feature is used in at least once. The impact describes the percentage of axioms a feature occurs in per ontology (note that we present both metrics as proportions over the whole corpus). Neither measure can perfectly quantify importance alone, therefore, we use both in our analysis where appropriate. In our survey, we will determine the impact and coverage of all temporal relations identified through smart matching, as well as the impact and coverage of their temporal features across the OBO Foundry ontologies. We also define a score to quantify the overall importance of a feature, which takes into account both the coverage and the impact, defined as follows:
Definition 5.
Let e be a temporal feature and $C$ be a set of ontologies. $\begin{matrix} Importance (e) \\ = \frac{n (Coverage (e)) + n (Impact (e))}{2} \end{matrix}$ where $n ()$ is a normalisation function that linearly rescales the data values to a range between 0 and 1.

The normalisation $n ()$ is applied to give both coverage and impact equal weight towards the importance score.
3.7. Ranked list of temporal requirement sets

Our goal is to produce an ordered list of temporal language requirements based on the results of our survey. We define a temporal requirement set, denoted $R$ , as a set of temporal annotations. For example, the temporal knowledge in $O$ requires $R$ if $O$ uses every annotation $A$ in $R$ . In order to quantify the Importance of $R$ , we make use of the following three metrics: (1) Coverage (Cov) , Necessity (Nec) and Mean-Annotation-Importance (MAI).

(1) Coverage indicates the number of ontologies for which a requirements set is sufficient; it corresponds to the number of ontologies that can be fully expressed if the temporal requirements in $R$ are met (i.e., the set of all temporal annotations used in $O$ is a subset of $R$ : $\begin{array}{rcl} Cov (R) & = & | {O \in C ∣ \forall A : O uses A implies \\ A \in R} | \end{array}$ This metric is of particular interest to language developers whose goal is to enable as many knowledge engineers as possible to express the full set of their temporal requirements. The disadvantage is that covering requirement sets are often large, i.e. contain a large number of temporal annotations and attributes, and may, therefore, be difficult to realise.

(2) The necessity score corresponds to the number of ontologies that need a particular set of temporal requirements to be met, i.e. $R$ is a subset of the set of all temporal annotations $A$ used in $O$ : $\begin{matrix} Nec (R) = | {O \in C ∣ \forall A \in R : O uses A} | \end{matrix}$ The advantage of using this metric as the basis for language design is that requirements with a high necessity score are typically small, and may benefit a wider group of users. The disadvantage is that there is no guarantee that any user will have all of their temporal requirements satisfied (or indeed a significant proportion).

(3) The third metric, mean annotation importance, is the mean importance score (see Definition 5) of all annotations in the requirement set: $\begin{matrix} MAI (R) = \frac{\sum_{A \in R} Importance (A)}{| R |} \end{matrix}$

To quantify the overall importance of a requirement set, we use the following formula: $\begin{matrix} Importance (R) \\ = \frac{n (Cov (R)) + n (Nec (R)) + n (MAI (R))}{3} \end{matrix}$ The normalisation function $n ()$ is used for the same reason as in Definition 5. As the total requirements space is in the worst case exponential in the number of distinct annotations,3

³
The powerset of all possible annotations.
we decided to consider only full sets of temporal annotations that occur in some OBO Foundry ontology. For example, if the full set of annotations used in an ontology $O_{1}$ was $A_{1}, A_{2}$ and $A_{3}$ , and the full set of annotations used in another ontology $O_{2}$ was $A_{1}, A_{2}$ and $A_{4}$ , we considered only the requirements $R_{1} = {A_{1}, A_{2}, A_{3}}$ and $R_{2} = {A_{1}, A_{2}, A_{4}}$ for our analysis, and not $R_{3} = {A_{1}, A_{2}}$ even though it is a subset of both $R_{1}$ and $R_{2}$ . This reduces the space of possible requirements drastically (to, in the worst case, the number of OBO Foundry ontologies). The advantage is that we do not have to concern ourselves with combinations of annotations that might be practically useless (because of annotations that would never co-occur in real ontologies). On the flip-side, the converse is true: we might miss small, almost covering requirement sets that could be potentially very useful. We do believe however that it is, when in doubt, best to be guided by the empirical distribution of co-occurring temporal annotations, so we chose to restrict our attention to “used” annotation combinations. Following this procedure resulted in a total of 75 requirements.
4. Results

A full account of the analysis (scripts and all results) can be found on rpubs (http://rpubs.com/matentzn/obo-tdl-v3). Although our main focus is on determining the importance of temporal requirements, we first discuss the findings of matchings, relations and attributes.

4.1. Smart & exact matching

For each ontology, we iterated through each terminological axiom and recorded whether or not the axiom contained an an exact match, or otherwise a smart match of an RO relation. We repeated this for every axiom in every ontology, for every relation in RO.

Out of 140 downloadable ontologies (December 2016) of the OBO Foundry Repository, 11 were not parseable. While 31 ontologies contained no RO relations according to our matching approach, 98 ontologies contained smart matches. It is noteworthy that, if we had relied on exact matches alone, only 68 ontologies would have matched RO relations. This means that we would have underestimated the need for temporal modelling significantly (30% of the OBO Foundry ontologies would have been ignored).

In terms of the axioms the relations are used in, if we were to ignore axioms that only had smart matches, we would be ignoring, again, 30% of all axioms in the OBO Foundry. Of course, it could be the case that all of the smart matches were incorrect matches (they were not meant to simulate RO relations), but we did investigate a reasonably sized random selection of the matches, and it seemed obvious that the relations were matched correctly. For example, some of the matched relations investigated were used in the same way (even temporally) as the way they are defined in the RO. Table 1 shows, for the top 10 elements, by how much the coverage would be underestimated when considering only exact matches.

Table 1
The top 10 RO relations showing their smart matching and corresponding exact matching metrics in terms of the percentage of ontologies they were matched in. % Diff is the percentage difference between the exact and smart matches

Relation Exact Smart % Diff

Part of 52.04 79.59 52.94

Has part 40.82 48.98 19.99

Inheres in 24.49 29.59 20.82

Has participant 17.35 27.55 58.79

Has role 16.33 26.53 62.46

Realizes 21.43 24.49 14.28

Located in 18.37 21.43 16.66

Has quality 12.24 20.41 66.75

Bearer of 15.31 19.39 26.65

Develops from 16.33 19.39 18.74

Relation	Exact	Smart	% Diff
Part of	52.04	79.59	52.94
Has part	40.82	48.98	19.99
Inheres in	24.49	29.59	20.82
Has participant	17.35	27.55	58.79
Has role	16.33	26.53	62.46
Realizes	21.43	24.49	14.28
Located in	18.37	21.43	16.66
Has quality	12.24	20.41	66.75
Bearer of	15.31	19.39	26.65
Develops from	16.33	19.39	18.74

4.2. Importance of temporal features

The temporal features are categorised based on their domain and range type, and analyses are performed within these categories. This decision was made because each feature contains different combinations of temporal attributes, which cannot be meaningfully evaluated against attributes contained in features with different domain and range types. This way, the analyses are rendered more comprehensible, and comparisons may be drawn against similar temporal phenomena. The domain-range categories used are Continuant-Continuant (CC), Occurrent-Occurrent (OO), Occurrent-Continuant or Continuant-Occurrent (OC-CO) and Other (OT) that includes features that contain the attribute (Domain:X-Range:X). Where appropriate, we use CAT as an abbreviation for domain-range categories.

4.2.1. Temporal relations

We begin by providing a short analysis of temporal relations used across OBO Foundry ontologies. The full tables that display the impact and coverage for every matched relation can be seen in Appendix A. A total of 145 relations were used across the OBO Foundry, of which 98 were CC (68%), 24 were OC-CO (17%), 18 were OO (12%) and 5 were OT (3%).

Fig. 9.

Distribution of the proportion of axioms with smart matches across ontologies.

Fig. 10.

Distribution of RO relation usage across ontologies.

Table 2

Metrics of relations ( $n = 145$ ) in each domain and range category

Type (n)	μ-cov (σ)	μ-imp (σ)	Correl	Min-cov	Max-cov	Min-imp	Max-imp
CC (98)	4.41 (5.8)	0.11 (0.31)	0.76	1.02	29.59	0	2.07
OO (18)	5.56 (5.48)	0.24 (0.53)	0.55	1.02	18.37	0	2.24
OC-CO (24)	6.85 (7.54)	0.13 (0.16)	0.76	1.02	27.55	0	0.56
OT (5)	26.73 (35.95)	2.97 (4.95)	0.94	1.02	79.59	0.01	11.52

Figures 9 and 10 show two histograms illustrating the prevalence and diversity of relations used. Figure 9 shows the distribution of ontologies by smart match prevalence, i.e the proportion of axioms that use at least one RO or RO-like relation compared to the total number of axioms in the ontology. For example, the microRNA ontology (MIRNAO) has 764 axioms, with 79 axioms using at least one of RO(-like) relation, resulting in a proportion of $79 / 764 = 10.34 %$ . As can be seen, there are 2 ontologies that have near $100 %$ relation usage in their axioms. Most have relation prevalence in the range of $0 %$ – $75 %$ , gradually declining towards the high proportion end. There is a large peak around the $0 %$ region. Some ontologies responsible for this peak are those that have large axioms counts, but low RO relation usage.

Figure 10 illustrates the diversity of RO relations as the total number of different RO relations that were used in an ontology. For example, MIRNAO makes use of 8 different RO relations (which is close to the empirical mean of 8.3 different relations per ontology). Only 8 ontologies contain more than 20 different RO relations, and, perhaps apart from UBERON (78) and OVAE (51), even these contain only a fraction of all existing RO relations. This indicates an overall low diversity of RO relations across single ontologies, however, we believe this to be expected: for an ontology to have a high diversity of relations, the domain for which the ontology covers would be considerably large. The majority of ontologies in the OBO Foundry cover specific areas of interest, ignoring the few upper-level ontologies that intend to classify general knowledge. This can explain both the high coverage across the corpus and the comparatively low within-ontology relation diversity.

Table 3

Top 10 temporal relations ordered by coverage

Relation	#O	Coverage	CAT
Part of	78	79.59	OT
Has part	48	48.98	OT
Inheres in	29	29.59	CC
Has participant	27	27.55	OC-CO
Has role	26	26.53	CC
Realizes	24	24.49	OC-CO
Located in	21	21.43	CC
Has quality	20	20.41	CC
Bearer of	19	19.39	CC
Develops from	19	19.39	CC

Table 4

Top 10 temporal relations ordered by impact

Relation	Impact	CAT
Part of	11.52	OT
Has part	3.03	OT
Immediately preceded by	2.24	OO
Inheres in	2.07	CC
Has quality	1.52	CC
Bearer of	1.30	CC
Develops from	0.99	CC
Has modifier	0.65	CC
Derives from	0.57	CC
Preceded by	0.56	OO

Summary metrics of impact and coverage can be seen in Table 2. Tables 3 and 4 show the top ten relations amongst all categories, ordered by their coverage and impact respectively. As can be seen in Tables 3 and 4, two OT relations have the highest impact and coverage. The remaining top ten relations for coverage and impact are mostly CC relations, with only 3 relations being OC-CO or OO.

As can be seen in Table 2, the average coverage and impact for CC, OO and OC-CO relations are roughly the same, whereas they are considerably higher for OT. The OT category dominates the relation results. This is due to the relation partOf which has both the highest scores by a considerable margin for impact and coverage out of all relations. Its inverse, hasPart also contributes to the high scores of the OT category with relatively high scores, outscoring every relation from any other category. The remaining relations in OT have low scores. Although the CC category has the highest number of used relations (98), only 12 have a coverage above 10 with the remaining relations’ coverage gradually declining towards 1.02 (1 ontology). Only 3 CC relations have impact above 1. OO and OC-CO have similar trends: few relations have relatively high coverage scores with the remaining declining steadily towards 0, and even fewer have notable impact scores. There is an overall strong correlation between coverage and impact for the CC, OC-CO and OT categories each falling above 0.7, whereas the OO correlation was only 0.55.

Table 5

Top 10 temporal attributes by coverage

Attribute	#O	Coverage	CAT
OT-Dom:X-Ran:X	84	85.71	OT
OT-Rig:Yes-Nec:No	84	85.71	OT
OT-TI:AHFAT	84	85.71	OT
OT-Time:Same	84	85.71	OT
CC-Dom:C-Ran:C	68	69.39	CC
CC-Dom:IC-Ran:C	62	63.27	CC
CC-Time:Same	60	61.22	CC
CC-Rig:Yes-Nec:No	59	60.20	CC
CC-TI:AHFAT	53	54.08	CC
CC-Dom:C-Ran:IC	46	46.94	CC

Table 6

Top 10 temporal attributes by impact

Attribute	Impact	CAT
OT-Time:Same	14.85	OT
OT-Dom:X-Ran:X	14.55	OT
OT-Rig:Yes-Nec:No	14.55	OT
OT-TI:AHFAT	14.55	OT
CC-Dom:C-Ran:C	10.40	CC
CC-Time:Same	8.49	CC
CC-Rig:Yes-Nec:No	8.22	CC
CC-Dom:IC-Ran:C	6.72	CC
CC-TI:AHFAT	4.83	CC
OO-Dom:O-Ran:O	4.39	OO

Table 7

Metrics of attributes ( $n = 73$ ) in each domain and range category

Type (n)	μ-cov (σ)	μ-imp (σ)	Correl	Min-cov	Max-cov	Min-imp	Max-imp
CC (31)	25.31 (21.23)	2.09 (2.83)	0.92	1.02	69.39	0.001	10.39
OO (16)	11.54 (13.34)	0.83 (1.49)	0.91	1.02	41.84	0	4.39
OC-CO (21)	18.90 (14.03)	0.67 (0.63)	0.76	3.06	46.94	0.07	2.38
OT (5)	69.18 (36.96)	11.76 (6.41)	1	3.06	85.71	0.29	14.85

4.2.2. Temporal attributes

Coverage & impact The coverage and impact of all temporal attributes can be found in Appendix B. Summary metrics of their impact and coverage can be seen in Table 7. The top ten attributes for both coverage and impact can be seen in Tables 5 and 6 respectively. OT attributes followed by CC attributes dominate the top ten scores, with only one other attribute from the OO category appearing in the top ten for either metric. The average coverages and impacts for each category have more variation than in the relation case.

73 attributes were used across all domain and range categories with 31 (42%) belonging to CC, 16 (22%) to OO, 21 (29%) to OC-CO and 5 (7%) to OT. The correlation between coverage and impact for each category is high ( $μ = 0.898$ ).

When considering CC attributes, it is clear that the most popular domain and range combinations were those between ICs (domain) and Cs (range). Other combinations are also prominent involving SDCs, whereas relations involving GDCs are less frequent. The Time:Same attribute, which indicates that elements involved in the relation are related at the same time point, has both higher coverage and impact than the Time:Diff attribute, which indicates that the elements are related at different time points (e.g., developsFrom). There is a considerable difference between the two (and for each of Time:diff’s subtypes), although the coverage of Time:diff is not low enough to ignore. Attributes from the States set are less frequent, with notable coverages, but low impacts. Finally, the attribute Rig:Yes scores in the top 3 attributes for coverage and impact, indicating that the majority of used CC relations require this feature.

OO relations only differ by their Time and Domain& Range attributes. Only 4 Time attributes have coverage above 10, and only one of which, Time:MeetsInverse has an impact score above 1. The overall impact average was particularly low for OO attributes. OO relations that were specifically declared to be between processes (identified by those relations having the attribute Dom:P-Ran:P) have a coverage of 10.20, roughly 25% of overall OO attribute coverage, but their impact is significantly lower at only 0.157, around 3% of the total impact for OO attributes.

Only 5 OC-CO attributes have impact over 1, with 3 coming from the Domain & Range set, 1 from the Rigid set and 1 from the State set. These attributes also appear in the top scoring coverage attributes. There is no significant Domain & Range type attribute that stands out above others. Two noteworthy findings are that (1) the Time:Same attribute has both higher coverage and impact than Time:Diff, and (2) the Rig:Yes attribute plays a key role.

The majority of OT attributes have the highest scores amongst all attributes, which are those that are contained within the annotations for the hasPart and partOf relations. Interestingly, the attribute Rig:Yes is one of the most used attributes, in terms of coverage and impact.

Table 8
Metrics of annotations ( $n = 68$ ) in each domain and range category

Type (n) μ-cov (σ) μ-imp (σ) Correl Min-cov Max-cov Min-imp Max-imp

CC (32) 8.83 (9.95) 0.325 (0.60) 0.85 1.02 34.69 0 2.23

OO (14) 6.78 (6.39) 0.31 (0.59) 0.51 1.02 21.43 0 2.24

OC-CO (19) 8.22 (8.09) 0.16 (0.16) 0.73 1.02 27.55 0 0.57

OT (3) 29.93 (48.32) 4.95 (8.31) 1 1.02 85.71 0.01 14.55

Type (n)	μ-cov (σ)	μ-imp (σ)	Correl	Min-cov	Max-cov	Min-imp	Max-imp
CC (32)	8.83 (9.95)	0.325 (0.60)	0.85	1.02	34.69	0	2.23
OO (14)	6.78 (6.39)	0.31 (0.59)	0.51	1.02	21.43	0	2.24
OC-CO (19)	8.22 (8.09)	0.16 (0.16)	0.73	1.02	27.55	0	0.57
OT (3)	29.93 (48.32)	4.95 (8.31)	1	1.02	85.71	0.01	14.55

4.3. Temporal annotations and temporal requirements

The coverage and impact scores of all annotations can be seen in Appendix C, with summary metrics in Table 8. A list of all annotations can be seen in Table 16 (Appendix C). Tables 9 and 10 show the top ten annotations amongst all categories, ordered by their coverage and impact respectively.

The coverage of annotations in each category follows a similar trend: a fraction of the annotations have coverage above 10, with the remainder gradually declining towards the minimum (1.02). Very few annotations have notable impact scores in each category, only 6 annotations have impact over 1 in the CC, OO and OT categories, and none have impact over 1 in OC-CO.

Table 9
Top 10 temporal annotations by coverage

Annotation #O Coverage CAT

A68 84 85.71 OT

A32 34 34.69 CC

A38 34 34.69 CC

A63 29 29.59 CC

A57 27 27.55 OC-CO

A59 24 24.49 OC-CO

A43 21 21.43 OO

A2 19 19.39 CC

A26 19 19.39 CC

A39 19 19.39 CC

Annotation	#O	Coverage	CAT
A68	84	85.71	OT
A32	34	34.69	CC
A38	34	34.69	CC
A63	29	29.59	CC
A57	27	27.55	OC-CO
A59	24	24.49	OC-CO
A43	21	21.43	OO
A2	19	19.39	CC
A26	19	19.39	CC
A39	19	19.39	CC

Table 10

Top 10 temporal annotations by impact

Annotation	Impact	CAT
A68	14.55	OT
A51	2.24	OO
A38	2.23	CC
A63	2.19	CC
A39	1.30	CC
A26	1.04	CC
A32	0.81	CC
A23	0.76	CC
A65	0.65	CC
A43	0.63	OO

4.3.1. Analysis of temporal requirements

Requirement sets are complete sets of temporal annotations that occur in at least one ontology. To quantify the importance of requirement sets, we take a two step approach. First, we compute an overall importance score, introduced in Section 3.7. Second, we compute the Pareto frontier.

Ideally, we would like to order the set of requirements in a way that allows users to understand which are the most relevant. However, if we consider importance, coverage and necessity equally important, there cannot be such an order: there is always a trade-off (if we increase coverage, we often need to decrease necessity). The Pareto frontier is the set of requirements that are Pareto-optimal. A Pareto-optimal requirement is a requirement for which there is no other requirement that has a higher value for one of the three metrics, without at the same time having a lower value for another. This way, the Pareto frontier gives us a natural set of requirements, that as a whole are strictly better than the set of requirements not on the Pareto frontier. Note that this selection of requirements satisfies a user only if they consider all three metrics equally important.

All requirements sets and their importance scores can be seen in Appendix D, in Tables 19 and 20.

Table 11
The top 15 requirement sets ordered by the their importance (IMP). ON: number of ontologies for which requirement set is necessary. PON: ON as proportion. OC: number of ontologies which are completely covered by requirement set. POC: OC as proportion. MAI: mean importance of annotations in requirement set. IMP: overall importance of requirement set. Shaded in grey or those requirements which are on the Pareto frontier w.r.t. to PON, POC and IMA

R ON PON OC POC MAI IMP

R75 84 0.86 17 0.17 1.00 0.78

R58 30 0.31 18 0.18 0.64 0.44

R51 31 0.32 18 0.18 0.61 0.44

R74 26 0.27 18 0.18 0.62 0.42

R46 19 0.19 21 0.21 0.57 0.39

R18 1 0.01 49 0.50 0.18 0.38

R67 13 0.13 21 0.21 0.58 0.37

R49 14 0.14 19 0.19 0.54 0.35

R65 8 0.08 24 0.24 0.43 0.32

R73 10 0.10 18 0.18 0.53 0.32

R50 11 0.11 19 0.19 0.46 0.31

R48 8 0.08 25 0.26 0.37 0.31

R27 1 0.01 38 0.39 0.17 0.30

R62 10 0.10 19 0.19 0.44 0.30

R7 2 0.02 36 0.37 0.20 0.30

R	ON	PON	OC	POC	MAI	IMP
R75	84	0.86	17	0.17	1.00	0.78
R58	30	0.31	18	0.18	0.64	0.44
R51	31	0.32	18	0.18	0.61	0.44
R74	26	0.27	18	0.18	0.62	0.42
R46	19	0.19	21	0.21	0.57	0.39
R18	1	0.01	49	0.50	0.18	0.38
R67	13	0.13	21	0.21	0.58	0.37
R49	14	0.14	19	0.19	0.54	0.35
R65	8	0.08	24	0.24	0.43	0.32
R73	10	0.10	18	0.18	0.53	0.32
R50	11	0.11	19	0.19	0.46	0.31
R48	8	0.08	25	0.26	0.37	0.31
R27	1	0.01	38	0.39	0.17	0.30
R62	10	0.10	19	0.19	0.44	0.30
R7	2	0.02	36	0.37	0.20	0.30

75 temporal requirements were identified, of which the top 15 (according to their importance score) can be seen in Table 11. Requirements on the Pareto frontier (12 in total), are shaded in grey (they do not have any requirement sets that are strictly better than them). For example, R49 is not on the Pareto frontier, but ranks eighth according to our importance score. This is because it scores, taking into account all three metrics, strictly worse than R46, while the overall importance score are roughly similar.

The average number of annotations per requirement is 7.733 ( $σ = 6.831$ ), and ranges from 1 to 39. The top 15 requirements (w.r.t importance) have an average of 5.3 ( $σ = 6.298$ ) annotations per requirement, slightly lower than the average score for all requirements. The necessity scores range from 1 to 84, and on average, each requirement set is necessarily needed for 7 ontologies. The coverage scores range from 1 to 49, and on average, 21 ontologies are completely covered per requirement.

When considering the diversity of annotations within each requirement set, on average, 44.3% of annotations are from the CC category (relations between continuants, e.g., contains), 15.3% from the OO category (relations between occurrents, e.g., precedes), 23.4% from the OC-CO category (relations between occurrents and continuants, e.g., existenceStartsDuring) and 16.3% are from the OT category (e.g., partOf). The annotation that occurs most often is A68, which occurs in 61 out of 75 (81%) requirements and annotates relations such as partOf and hasPart. A68 is the only annotation to occur in R75 – the requirement with the largest necessity, mean annotation importance and overall importance scores. A68 also appears in every requirement on the Pareto frontier.

The diversity of the 12 Pareto optimal requirements is as follows: on average, 41.8% of the requirement sets’ annotations are from the CC category, 14.6% from the OO category, 6.5% from the OC-CO category and 32.9% are from the OT category.

Considering only the top 5 requirement sets, the diversity of annotations along with their attributes is relatively low. Only 5 annotations are used within the top 5 requirement sets made up of only 19 attributes. 4 of the annotations belong to the CC category, 0 to OO, 0 to OC-CO and 1 to OT. 15 of their attributes belong to the CC category and 5 to the OT category. The diversity within each domain category is relatively low. For example, regarding the CC category which contains 15 attributes, 2 of these attributes come from the States set, 3 from the Time set, 7 from the Domain & Range set, 1 from the Identity set, 1 from the Rigid set and 1 from the AHFAT set. Only 9 requirements (R2, R75, R42, R66, R5, R69, R68, R72, R63) have annotations from only one domain and range category. 20 requirement sets have annotations from 2 categories, another 23 have annotations from 3 categories and the remaining 23 requirement sets contain annotations from all 4 categories.

This demonstrates the level of coverage needed by a suitable temporal language extension to OWL. Based on all requirement sets, it would not be enough for a language extension to only focus on one type of temporal phenomenon (for example, the modelling of continuants) as the majority of requirements contain more than just one type of domain entity.

However, based on the overall importance scores, it could be argued that the most important requirements, for example, the top 5 requirements, could almost be fully modelled by a language extension that focuses on only one type of temporal entity (continuants), since 90% of the annotations for these requirements only require the modelling of continuants.

To demonstrate the necessary modelling capabilities of a suitable temporal extension $L$ to OWL, consider only the top 5 requirements. With regards to the 5 annotations used throughout these requirements, 4 of them are associated with continuants, and contain the following 15 attributes: (i) Domain:SDC-Range:IC; (ii) Domain:SDC-Range:C; (iii) Domain:C-Range:IC; (iv) Domain:IC-Range:IC; (v) Domain:IC-Range:SDC; (vi) Domain:IC-Range:C; (vii) Domain:C-Range:C; (viii) Time:Diff; (ix) Time:Past; (x) Time:Same; (xi) Domain:Changed-Nec:No; (xii) Domain:Birth-Nec:No; (xiii) Identity:Same-Nec:No; (xiv) Rigid-Nec:No; (xv) AHFAT. First and foremost, the language extension $L$ would need to at least be able to model continuants in general, and in particular, independent and specifically dependent continuants (i)–(vii). However, that is not to say other types of continuants, such as generically independent continuants do not need to be modelled. Furthermore, two types of time constraints on relations are necessary: same time relations and different time relations (viii)–(x). Additionally, existence, changing states and identity would need to be expressible in $L$ to allow for continuants to be born and change state at specific time points and also to allow for identity to be enforced over multiple time points (xi)–(xiii). Lastly, relation rigidity needs to be expressible as well as the ability to state that axioms hold for all times (xiv)–(xv). The remaining annotation, A68, contains the attributes: (i) AHFAT; (ii) Time:Same; (iii) Domain:X-Range:X; (iv) Rigid:Nec:No. To express these additional attributes, $L$ would not only have to have similar expressivity to that described previously, but also have expressivity to model occurrents too (iii).

When excluding A68 from $L$ ’s requirements, its expressivity demands are still diverse, despite the fact that the focus is only on one type of temporal entity: continuants. This shows that any temporal language candidate requires high expressivity to even be able to model only a few of the most important requirements.

5. Discussion

To the best of our knowledge, this is the first study to systematically assess and report on a set of requirements for ontologies in a particular domain. By using a temporally annotated data set that is used widely across the ontology corpus, we were able to determine which individual temporal features in the data-set are most important, as well as their co-occurrence with other temporal features, both in terms of their usage in each ontology, and their coverage.

When considering the individual temporal features, due to the extent of diversity between the features, they were analysed in groups, categorised by their occurrence with the different domain and range features. We found that certain attributes were more prominent in the corpus than others. For example, when considering temporal features belonging to the CC category (those features used in relations whose domain and range type were both continuants), same-time relations were more common than both past-time and future-time relations. Due to the nature of the encoding scheme, we were also able to compare relation categories against each other. OT relations were overall the most prominent amongst the corpus (in terms of coverage and impact), followed by CC relations. OO and OC-CO relations had roughly the same usage.

The analysis of the defined requirements showed that there is high diversity amongst ontologies w.r.t the different categories of temporal phenomena. On average, we found that requirements are made up of just under half of CC attributes, followed by a quarter of OC-CO attributes, and the rest are made up OT and OO attributes. However, when focusing on the Pareto optimal requirements, OT attributes become more prevalent. This is an important result since it shows that in order to meet the requirements, a language would have to be able to model a diverse set of temporal attributes. This may be difficult due to how different the attributes are in nature. For example, being able to model both continuants and occurrents may be difficult, due to how temporally different these entities are.

Amongst all stages of analysis, the relations part of and has part, along with their annotation, attributes and presence in requirements, were considered the most important. These relations were the most used relations, both in terms of coverage and impact. Their attributes and annotation had the highest scores for coverage and impact, and their annotation was used in 81% of all requirements, 100% of the top 15 requirements, and 100% of the requirements on the Pareto frontier. Arguably, the most interesting feature of these relations was the rigid attribute. It is well known that having the ability to model rigidity in temporal logics is a computationally hard problem [15,16], which often leads to undecidability. If this is considered to be one of the most important features, many potential temporal language candidates may be deemed unsuitable.

Although not studied in detail in this paper, the analyses of the data and the definition of the requirements are intended to aid in the identification of a suitable temporal extension of OWL (or its underlying logic) to better aid in the modelling of the temporal features found. We showed that the level of coverage needed for even single requirements was very high. Language designers can use the requirement sets to determine how effective their languages are and to determine how best to extend their language if it is not suitable. They could also be used to drive the development of new language extensions based solely on the requirements found in this study. Languages could also be compared based on how many temporal requirements are met.

5.1. Limitations

Although we identified a large amount of temporal features present in the corpus of ontologies, they do not represent an exhaustive set of features. All features used were only derived from the relations used in RO. Ontologies may exhibit other types of temporal phenomena outside of the relation space which was not covered by this survey. For example, the temporal features extracted from the relations did not inform on the type of timeline that was needed to express the feature, such as a linear timeline compared to a branching timeline. Therefore, we can only claim to have defined a subset of the temporal requirements of the ontologies. At the present time, it is not clear how additional data could be extracted in a systematic or automated way, not only due to the size of ontologies and the additional time needed for manual inspection, but also due to there not being another known shared resource such as the Relation Ontology, or the Basic Formal Ontology, allowing data to be easily analysed.

When running our survey, we relied heavily on the notion of smart matching: a way to match relations across terminologies that look similar, but use different IRIs. Although our matching technique was sensible, it is possible that some of the matches may have been incorrect, or other matches may have been missed. Manual inspection of a sample of the matched relations suggested otherwise, however, some matches could still be missed.

Table 12
Temporal relations, grouped by temporal category and ordered by coverage (COV)

Relation #O COV CAT

Inheres in 29 29.59 CC

Has role 26 26.53 CC

Located in 21 21.43 CC

Has quality 20 20.41 CC

Bearer of 19 19.39 CC

Develops from 19 19.39 CC

Derives from 16 16.33 CC

Adjacent to 15 15.31 CC

Concretizes 15 15.31 CC

Has function 10 10.20 CC

Has member 10 10.20 CC

Towards 10 10.20 CC

Overlaps 9 9.18 CC

Continuous with 8 8.16 CC

Composed primarily of 7 7.14 CC

Has component 7 7.14 CC

Location of 7 7.14 CC

Member of 7 7.14 CC

Surrounded by 7 7.14 CC

Function of 6 6.12 CC

Is concretized as 6 6.12 CC

Produces 6 6.12 CC

Role of 6 6.12 CC

Surrounds 6 6.12 CC

Attached to 5 5.10 CC

Inheres in part of 5 5.10 CC

Connected to 4 4.08 CC

Connects 4 4.08 CC

Has developmental contribution from 4 4.08 CC

Innervates 4 4.08 CC

Relation	#O	COV	CAT
Inheres in	29	29.59	CC
Has role	26	26.53	CC
Located in	21	21.43	CC
Has quality	20	20.41	CC
Bearer of	19	19.39	CC
Develops from	19	19.39	CC
Derives from	16	16.33	CC
Adjacent to	15	15.31	CC
Concretizes	15	15.31	CC
Has function	10	10.20	CC
Has member	10	10.20	CC
Towards	10	10.20	CC
Overlaps	9	9.18	CC
Continuous with	8	8.16	CC
Composed primarily of	7	7.14	CC
Has component	7	7.14	CC
Location of	7	7.14	CC
Member of	7	7.14	CC
Surrounded by	7	7.14	CC
Function of	6	6.12	CC
Is concretized as	6	6.12	CC
Produces	6	6.12	CC
Role of	6	6.12	CC
Surrounds	6	6.12	CC
Attached to	5	5.10	CC
Inheres in part of	5	5.10	CC
Connected to	4	4.08	CC
Connects	4	4.08	CC
Has developmental contribution from	4	4.08	CC
Innervates	4	4.08	CC

Table 12

(Continued)

Relation	#O	COV	CAT
Produced by	4	4.08	CC
Bounding layer of	3	3.06	CC
Contains	3	3.06	CC
Develops into	3	3.06	CC
Directly develops from	3	3.06	CC
Has potential to develop into	3	3.06	CC
Innervated_by	3	3.06	CC
Conduit for	2	2.04	CC
Contributes to morphology of	2	2.04	CC
Developmentally induced by	2	2.04	CC
Developmentally replaces	2	2.04	CC
Develops in	2	2.04	CC
Has 2D boundary	2	2.04	CC
Has habitat	2	2.04	CC
Has modifier	2	2.04	CC
Has plasma membrane part	2	2.04	CC
Has potential to developmentally contribute to	2	2.04	CC
Has skeleton	2	2.04	CC
Has soma location	2	2.04	CC
Has synaptic terminal in	2	2.04	CC
Immediate transformation of	2	2.04	CC
Interacts with	2	2.04	CC
Luminal space of	2	2.04	CC
Quality of	2	2.04	CC
Skeleton of	2	2.04	CC
Supplies	2	2.04	CC
Synapsed by	2	2.04	CC
Synapsed to	2	2.04	CC
Transformation of	2	2.04	CC
Tributary of	2	2.04	CC
Attached to part of	1	1.02	CC
Branching part of	1	1.02	CC
Child nucleus of	1	1.02	CC
Child nucleus of in hermaphrodite	1	1.02	CC
Child nucleus of in male	1	1.02	CC
Confers advantage in	1	1.02	CC
Contained in	1	1.02	CC
Determined by	1	1.02	CC
Determined by part of	1	1.02	CC
Develops from part of	1	1.02	CC
Distributary of	1	1.02	CC
Drains	1	1.02	CC
Electrically_synapsed_to	1	1.02	CC
Expresses	1	1.02	CC
Fasciculates with	1	1.02	CC
Gene product of	1	1.02	CC
Has disposition	1	1.02	CC

Table 12

(Continued)

Relation	#O	COV	CAT
Has fused element	1	1.02	CC
Has host	1	1.02	CC
Has muscle antagonist	1	1.02	CC
Has muscle insertion	1	1.02	CC
Has muscle origin	1	1.02	CC
Has postsynaptic terminal in	1	1.02	CC
Has presynaptic terminal in	1	1.02	CC
Has synaptic terminal of	1	1.02	CC
Has vector	1	1.02	CC
In homology relationship with	1	1.02	CC
Lumen of	1	1.02	CC
Molecularly interacts with	1	1.02	CC
Partially overlaps	1	1.02	CC
Serially homologous to	1	1.02	CC
Spatially disjoint from	1	1.02	CC
Synapsed_via_type_Ib_bouton_to	1	1.02	CC
Synapsed_via_type_II_bouton_to	1	1.02	CC
Synapsed_via_type_III_bouton_to	1	1.02	CC
Synapsed_via_type_Is_bouton_to	1	1.02	CC
Transcribed from	1	1.02	CC
Transcribed to	1	1.02	CC
Has participant	27	27.55	OC-CO
Realizes	24	24.49	OC-CO
Realized in	17	17.35	OC-CO
Participates in	15	15.31	OC-CO
Occurs in	14	14.29	OC-CO
Capable of	10	10.20	OC-CO
Has output	8	8.16	OC-CO
Output of	6	6.12	OC-CO
Has input	5	5.10	OC-CO
Existence starts during	4	4.08	OC-CO
Existence starts during or after	4	4.08	OC-CO
Capable of part of	3	3.06	OC-CO
Existence ends during	3	3.06	OC-CO
Existence ends during or before	3	3.06	OC-CO
Existence starts and ends during	3	3.06	OC-CO
Actively participates in	2	2.04	OC-CO
Existence ends with	2	2.04	OC-CO
Existence starts with	2	2.04	OC-CO
Formed as result of	2	2.04	OC-CO
Has active participant	2	2.04	OC-CO
Results in formation of	2	2.04	OC-CO
Contains process	1	1.02	OC-CO
Functionally related to	1	1.02	OC-CO
Has intermediate	1	1.02	OC-CO
Preceded by	18	18.37	OO
Immediately preceded by	15	15.31	OO

Table 12

(Continued)

Relation	#O	COV	CAT
Precedes	15	15.31	OO
Regulates	10	10.20	OO
Negatively regulates	6	6.12	OO
Starts	6	6.12	OO
Ends during	4	4.08	OO
Positively regulates	4	4.08	OO
Ends	3	3.06	OO
Happens during	3	3.06	OO
Obsolete preceded by	3	3.06	OO
Ends with	2	2.04	OO
Immediately precedes	2	2.04	OO
Starts during	2	2.04	OO
Starts with	2	2.04	OO
Causally downstream of	1	1.02	OO
Causally upstream of or within	1	1.02	OO
Simultaneous with	1	1.02	OO
Part of	78	79.59	OT
Has part	48	48.98	OT
In taxon	2	2.04	OT
Only in taxon	2	2.04	OT
Depends on	1	1.02	OT

5.2. Outlook

Before beginning to evaluate temporal language extensions, our next steps include further verification of our requirement results. We hope to achieve this by contacting ontology authors and confirming (1) whether our interpretation of their ontology’s requirements was correct (2) whether our smart matching results were valid, and (3) whether our temporal interpretations of relations coincide with their own interpretations. This would reinforce the validity of our results and possibly make them more fine-grained: determining how relations are intended to be interpreted on an individual ontology level would allow us to eliminate the Necessary attributes (e.g. Rigid:Yes-Necessary:No), which would eliminate uncertainty in the requirements.

The system we created for defining the importance of certain features used throughout ontologies could be used in other application domains to determine importance of entities, not necessarily temporal. We intend to further generalise this procedure and apply it to other application domains to test its efficacy as an entity importance measuring system for ontologies.

6. Conclusion

Our study produced an empirically validated set of requirements that describe the temporal content of ontologies in the bio-health domain. The results showed that the temporal requirements are diverse and cover a wide range of different phenomena. These results aim to provide a mechanism to show which temporal language extensions are most suitable for the temporal modelling of bio-health ontologies and can also drive the creation of new language extensions, specifically tailored to the requirements and the temporal nature of bio-health ontologies.

Footnotes

Relations

Temporal attributes

Annotations

Requirements

References

J.F.

Allen , Maintaining knowledge about temporal intervals, Communications of the ACM26(11) (1983), 832–843. doi:10.1145/182.358434.

Baader ,

Brand and

Lutz , Pushing the EL envelope, in: Proc. of IJCAI 2005, Morgan-Kaufmann Publishers, 2005, pp. 364–369.

Baader ,

Calvanese ,

D.L.

McGuinness ,

Nardi and

P.F.

Patel-Schneider (eds), The Description Logic Handbook: Theory, Implementation, and Applications, in: Description Logic Handbook, Cambridge University Press. ISBN 0-521-78176-0.

Baader and

Hanschke , A scheme for integrating concrete domains into concept languages, in: IJCAI, 1991, pp. 452–457.

Baader and

Sattler , An overview of tableau algorithms for description logics, Studia Logica69(1) (2001), 5–40. doi:10.1023/A:1013882326814.

Brandt , Reasoning in ELH w.r.t. General Concept Inclusion Axioms, Technical Report, TU Dresden, 2004.

Costa ,

Reeve ,

Grumbling and

Osumi-Sutherland , The drosophila anatomy ontology, Journal of Biomedical Semantics4(1) (2013), 32. http://www.jbiomedsem.com/content/4/1/32. doi:10.1186/2041-1480-4-32.

B.C.

Grau ,

Horrocks ,

Motik ,

Parsia ,

P.F.

Patel-Schneider and

Sattler , OWL 2: The next step for OWL, Journal of Web Semantics6(4) (2008), 309–322. doi:10.1016/j.websem.2008.05.001.

Grenon ,

Smith and

L.J.

Goldberg , Biodynamic ontology: Applying BFO in the biomedical domain, Studies in health technology and informatics102 (2004), 20–38.

10.

Gutiérrez-Basulto ,

J.C.

Jung and

Schneider , The complexity of temporal description logics with rigid roles and restricted TBoxes: In quest of saving a troublesome marriage, in: Proceedings of the 28th International Workshop on Description Logics, Athens, Greece, June 7–10, 2015, 2015. http://ceur-ws.org/Vol-1350/paper-23.pdf.

11.

Horrocks ,

Kutz and

Sattler , The even more irresistible SROIQ, in: KR, 2006, pp. 57–67.

12.

Kazakov , SRIQ and SROIQ are harder than SHOIQ, in: Proceedings of the 21st International Workshop on Description Logics (DL2008), Dresden, Germany, May 13–16, 2008, 2008. http://ceur-ws.org/Vol-353/Kazakov.pdf.

13.

Krötzsch ,

Simancik and

Horrocks , A Description Logic Primer, CoRR. arXiv:1201.4089, 2012.

14.

Little and

Cox , Time Ontology in OWL, 2017. https://www.w3.org/TR/2017/REC-owl-time-20171019/.

15.

Lutz , Description logics with concrete domains – a survey, in: Advances in Modal Logic, 2002, pp. 265–296.

16.

Lutz ,

Wolter and

Zakharyaschev , Temporal description logics: A survey, in: TIME, 2008, pp. 3–14.

17.

Milea ,

Frasincar and

Kaymak , tOWL: A temporal web ontology language, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)42(1) (2012), 268–281. doi:10.1109/TSMCB.2011.2162582.

18.

Sioutos ,

de Coronado ,

M.W.

Haber ,

F.W.

Hartel ,

Shaiu and

L.W.

Wright , NCI thesaurus: A semantic model integrating cancer-related clinical and molecular information, Journal of Biomedical Informatics40(1) (2007), 30–43. doi:10.1016/j.jbi.2006.02.013.

19.

Smith ,

Ashburner ,

Rosse ,

Bard ,

Bug ,

Ceusters ,

L.J.

Goldberg ,

Eilbeck ,

Ireland ,

C.J.

Mungall , OBI Consortium,

Leontis ,

Rocca-Serra ,

Ruttenberg ,

S.-A.A.

Sansone ,

R.H.

Scheuermann ,

Shah ,

P.L.

Whetzel and

Lewis , The OBO foundry: Coordinated evolution of ontologies to support biomedical data integration, Nature biotechnology25(11) (2007), 1251–1255. doi:10.1038/nbt1346.

20.

Smith ,

Ceusters ,

Klagges ,

Köhler ,

Kuma ,

Lomax ,

Mungall ,

Neuhaus ,

Rector and

Rosse , Relations in Biomedical Ontologies, Genome Biology6(5) (2005), R46.

21.

C.A.

Welty and

Fikes , A reusable ontology for fluents in OWL, in: Formal Ontology in Information Systems, Proceedings of the Fourth International Conference, FOIS 2006, Baltimore, Maryland, USA, November 9–11, 2006, 2006, pp. 226–236. http://www.booksonline.iospress.nl/Content/View.aspx?piid=2209.

A systematic survey of temporal requirements of bio-health ontologies

Abstract

Keywords

1. Introduction

2. Temporal patterns in bio-health ontologies

2.1. The OBO foundry

3. Materials & methods

3.1. Overview

3.2. Defining and identifying temporal attributes

Definition 1 (Temporal Annotation).

3.5. Usage of temporal features

4.1. Smart & exact matching

4.2.1. Temporal relations

Table 9 Top 10 temporal annotations by coverage Annotation #O Coverage CAT A68 84 85.71 OT A32 34 34.69 CC A38 34 34.69 CC A63 29 29.59 CC A57 27 27.55 OC-CO A59 24 24.49 OC-CO A43 21 21.43 OO A2 19 19.39 CC A26 19 19.39 CC A39 19 19.39 CC

5.1. Limitations

6. Conclusion

Footnotes

Relations

Temporal attributes

Annotations

Requirements

References

Table 9
Top 10 temporal annotations by coverage

Annotation #O Coverage CAT

A68 84 85.71 OT

A32 34 34.69 CC

A38 34 34.69 CC

A63 29 29.59 CC

A57 27 27.55 OC-CO

A59 24 24.49 OC-CO

A43 21 21.43 OO

A2 19 19.39 CC

A26 19 19.39 CC

A39 19 19.39 CC