Abstract
The Description Logic
Introduction
The Web Ontology Language (OWL), as standardised by the World Wide Web Consortium (W3C) is a collection of knowledge representation languages designed for use in many application scenarios, providing the means to model information in a precise and structured way to enable the semantic web. An OWL Ontology is a set of axioms describing the classes and properties of a domain of interest. OWL 2 [8] is the current iteration (and successor) of OWL, and has two levels of expressivity: OWL 2 DL and OWL 2 Full, the former having a Description Logic (DL) as its logical basis. DLs [3,5,13] are decidable fragments of First Order Logic and have the ability to reason with information in a meaningful way. Two of the main aspects of DLs are to: (1) provide ways to model relations between three kinds of entities in the domain of interest, those being concept descriptions, roles and individuals names and (2) to build complex terms, usually called concept expressions, axioms and assertions and even knowledge bases (or ontologies). There are many varieties of DLs and they often differ by what constructors, axioms and operators are allowed, which in turn offers different levels of expressivity. The DL underlying OWL 2 DL is
The importance of ontologies has increased over the past decade, particularly with applications within the semantic web and life science domain. If we shift our attention solely on applications within life science, particularly those focused around the bio-health domain, we see a plethora of current ontologies serving different purposes, ranging from describing the development of biological entities, classification of diseases, anatomy descriptions, life cycle stage sequencing and many more. Take the OBO Foundry [19] as an example, an active ontology corpus which has been developed over the past 10 years, containing over 130 actively maintained bio-medical ontologies. The corpus contains ontologies such as the Drosophila Gross Anatomy Ontology [7] which describes the anatomy and development of the common fruit fly, as well as medical terminological systems such as the National Cancer Institute Thesaurus (NCIT) [18].
Many applications in life science often include concepts involving time. Take for example an ontology describing the development of some biological entity. Any development inherently involves time: statements made in the ontology could include descriptions of elements developing, an entity occurring during a particular time or an event occurring before, after or during another event. It is clear that time information would be essential in such examples. From a different viewpoint, for instance, in a clinical setting, other temporal information may be needed such as disease progression or medical frequency. Apparently, different application domains embed various types of temporal features.
As expressive as ontologies and their underlying DLs are, there are still limiting factors over what they can and cannot express. OWL 2 does offer a way to encode some temporal information, for example, through time stamping (data types), but offers no way to describe any real type of change since as it is still a static logic (being a fragment of First Order Logic). It could be beneficial to both ontology authors and users of ontologies to have some sense of time encoded into the underlying rationale, allowing better representation of temporal aspects and the ability to query knowledge in the past, present or future. Clearly, if temporal information is needed but cannot be represented, then it may be the case that many ontologies may be currently misrepresented, or at least OWL does not have the required expressivity to meet the temporal requirements of these ontologies. The temporal requirements of bio-health ontologies could range from the accurate modelling of a specific type of temporal entity, such as a biological entity developing through time, to the modelling of a suitable time-line for which the temporal entities could develop through. Currently, it is not clear exactly what kind of temporal expressivity is necessary to meet the temporal needs of bio-health ontologies, simply because the temporal requirements of these ontologies are rather diverse and not precisely described.
Many efforts have been made in an attempt to overcome the general problem. Temporal extensions to DLs have been given a lot of attention in recent years. Many proposals exist, ranging from: combining classical temporal logics such as
Very few of these temporal extensions have been investigated for a specific application area, and those that have are not transferable to other applications. In recent years, research on two-dimensional TDLs has been focused solely on complexity results rather than capturing the needs of some temporal domain [16], similarly for DLs extended with concrete domains [15]. We believe this is because both have fascinating complexity results [10,15,16]: it is very easy for these logics to enter into the undecidability realm, which is undesirable for DLs and ontologies. It may be the case that some of the proposed extensions may, in fact, be suitable for modelling the temporal requirements of bio-health ontologies, but since the temporal requirements of bio-health ontologies are yet to be discovered, an evaluation of these logics has yet to be accomplished. If the requirements were known, we could evaluate the current proposals, to see which were most suited, and if none were, we could set out to define a new logic based on these requirements in an attempt to solve this problem.
In this paper, we provide a foundation for defining a suitable temporal extension to OWL, in particular, to cover the temporal requirements of bio-health ontologies. We produce an empirically validated set of temporal requirements based on a survey of an up to date and actively maintained corpus of bio-health ontologies: the OBO Foundry ontology repository corpus, alongside one of its popular upper level ontologies – the Relation Ontology [20]. We characterise the corpus with respect to a rich set of temporal features and survey their coverage and impact. We then compile a list of Temporal Requirement Sets, based on the weighted temporal features. These requirement sets can then be used as either an evaluation mechanism for existing temporal extensions to test their suitability or as a mechanism to drive the definitions of new temporal extensions.
The contributions of this paper are: (1) an encoding scheme used to annotate temporal aspects of the Relation Ontology, acting as a seed to our survey, (2) a generalisable entity importance measuring system, which can measure the importance of entities used throughout the temporally encoded Relation Ontology over a corpus of ontologies and (3) sets of empirically validated temporal requirements acting as guidelines to temporal extensions to OWL.
Temporal patterns in bio-health ontologies
The background and motivation of this paper are presented via examples of how temporal information is currently represented in bio-health ontologies. To be able to do so, we introduce several key biological notions and terms crucial to understand the presented examples. We also introduce key aspects that are relevant to our survey that go hand in hand with temporal modelling. From this point onwards we assume the reader to be familiar with OWL and have a basic understanding of Description Logics (DLs), including their syntax and semantics.
The OBO foundry
The OBO Foundry1
The basic formal ontology The Basic Formal Ontology (BFO) is a formal upper-level ontology based on tested conventions for ontology creation. The ontology is built upon a collection of sub-ontologies: the SNAP ontology and the SPAN ontology. The former defines entities known as continuants (or endurants) and the latter defines entities known as occurrents (or processes).
In general, continuants are known to be objects that endure or persist through time. They can undergo changes, inhere in objects, be physical objects themselves, but must persist during the times they exist. Examples of continuants are you, your clothes, a pen, a phone, etc. From a biological viewpoint, continuants could include cells, your heart, your blood, your blood type, etc. BFO divides continuants into three separate categories, namely: independent continuants, generically dependent continuants, and specifically dependent continuants. Independent continuants are those continuants that can stand alone and continue to persist, i.e., they do not rely solely on something else for their existence. Dependent continuants do rely on something else for their existence to persist. The difference between specifically dependent continuants and generically dependent continuants is that the former relies on exactly one independent continuant (its bearer) for its existence (and it will cease to exist once its bearer does), whereas the latter can have multiple bearers. An example of specifically dependent continuant is the shape of a ball (round). An example of a generically dependent continuant is an entry in a database (it relies on each value in the entry).
Occurrents, on the other hand, are disjoint from continuants. Occurrents are those entities that unfold through time in temporal phases. They are often referred to as events or processes. If a continuant were subject to an event occurring, such as a heart (the continuant) beating (the event), the occurrent would be the event itself. Therefore, occurrents are not physical objects themselves; they are the events that unfold around the objects, subject to time. The occurrent class is also partitioned into several subclasses, namely: process, process boundary, spatiotemporal region and temporal region. A process is an occurrent that has temporal parts and depends on some material entity for some time. For example, consider a person over the course of his life, starting in childhood and ending in late adulthood. The process experienced by this individual would have been the process of ageing, and it would depend on that person itself. Process boundaries are temporal parts of processes that themselves have no other temporal parts. The example given by BFO of a temporal boundary is “the boundary between the 2nd and 3rd year of your life”. Temporal regions are simply occurrents that have references to some notion of time (instances or intervals). Examples include the time right now, the range of time during when you were born until your eventual death, the time that covered the year 1990, etc. Finally, spatiotemporal regions are defined as occurrents that are part of space-time. Examples are the region occupied by the life of a biological entity and the region occupied by the development of a disease.
It is clear that both continuants and occurrents are objects that require time to be defined and understood. Many of the ontologies in the OBO Foundry have incorporated the BFO’s class hierarchies into their structures (adhering to OBO’s principles), inheriting their properties and definitions. Having a unified and well-defined structure leads to less ambiguity in their understanding and helps to make integration easier.

Left: an OWL model of a development fragment of the drosophila ontology. Right: a temporalised OWL model of the same development fragment.
The relation ontology The Relation Ontology2
Available for download at
“a core relation that holds between a part and its whole”
“Parthood requires the part and the whole to have compatible classes: only an occurrent can be part of an occurrent; only a process can be part of a process; only a continuant can be part of a continuant; only an independent continuant can be part of an independent continuant; only an immaterial entity can be part of an immaterial entity; only a specifically dependent continuant can be part of a specifically dependent continuant; only a generically dependent continuant can be part of a generically dependent continuant. (This list is not exhaustive.)”
“Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at certain times, but this is difficult to specify in OWL.”
RO relations cover the vast majority of pairings over the classes they define. For example, relational hierarchies present in RO cover relationships between independent continuants and processes, outlined in the hierarchy relation between structure and stage, which include relations such as existence starts during and existence ends during. Other branches of the hierarchy include relations between independent continuants and specifically dependent continuants such as the relation bearer of.
Both occurrents and continuants are crucial to the relations of RO, and thus to all of the ontologies in the OBO Foundry that use RO. As with the BFO, many terms in RO have temporal information present and require this information to be correctly interpreted.
We now present an example of temporal modelling present in an OBO Foundry ontology. The example will use relations from RO and entities that correspond to those described in BFO and will illustrate the temporal weakness of OWL and show support for our survey.
The Drosophila Gross Anatomy Ontology describes the anatomy and developmental stages of the life cycle of the Drosophila melanogaster (the common fruit fly). We present a small fragment of the ontology describing the development of the spermatid cell, a part of the male germline cell of the fly itself. The fragment shows temporal patterns through two of its most used properties; develops from and part of, and can be broken down between 4 stages shown in the following axioms:
We identify two major temporal aspects of this development sequence. The first is that there is a single entity developing (the spermatid – a continuant) and the second is that there is a continuous partonomy between the two entities (the other element being the spermatid cyst – also a continuant) whilst they are developing. Due to the way the ontology is modelled, none of these temporal constraints can truly be enforced in OWL. Consider Fig. 1. The use of the existential restriction ‘∃’ in the axioms may refer to distinct elements for each possible Spermatid, immediately losing any possible identity constraints. This could lead to problems involving errors in the duplication of properties. For example, the Spermatid could have constraints on it itself, and thus each Spermatid in the example model would also be subject to these constraints. Then, if a change was to occur in one Spermatid, it would not necessarily appear in another Spermatid since they could all be distinct. A knock-on effect is that Spermatid Cysts that the Spermatids are part of do not have to be the same Spermatid Cyst, which can again lead to similar problems. In an ideal setting, the identity between the Spermatids must be maintained, as should the partonomy between the same elements. A more faithful model is also presented in Fig. 1. In this model, we imagine OWL to have an embedded time-line, where we can view normal OWL worlds (or models) at different time points, like the two-dimensional semantics seen in
This example shows yet another clear-cut case of OWL’s lack of temporal expressivity, and more importantly shows a significant amount of temporal information loss for only two relations and a small number of axioms. The motivation of this paper is driven by examples such as these; develops from and part of alone seem to be important relations for the Drosophila Ontology. Together, they are roughly used in one-third of the total logical axioms in the ontology, which could imply that one-third of the ontology is unfaithfully modelled. It would also be useful to know how often they are used in other bio-health ontologies. If they are only used in the Drosophila Ontology and no other, then it would be an over statement to say that both of the relations were of crucial importance to the temporal modelling of bio-health ontologies. Yet, if they were also used in one-third of axioms in all bio-health ontologies, it would not be unfair to say they were important relations. It would also not be unfair to state that, for example, independent continuants were important for modelling in bio-health ontologies, since the domain and range of develops from are restricted to this specific class, which would mean that one-third of the axioms in those ontologies require independent continuants.
The relations develops from and part of encode specific temporal information: develops from relates entities over two different time points (a past time relation), whereas part of relates entities in a single time point (a same time relation). Moreover, develops from relates two independent continuants, whereas part of can be used for continuants or occurrents, provided both types are compatible. We call these attributes of relations temporal attributes. Using the same reasoning as above, all of these attributes could be seen as important for temporal modelling of bio-health ontologies. If there was another relation in the Drosophila ontology that had the same temporal attributes as develops from that was also considered important, then it would make sense to also focus on the importance of the attributes themselves rather than just the individual relations.
Our survey intends to empirically and systematically rank the importance of these types of temporal features. We propose to annotate all relations in RO that are used across The OBO Foundry with their temporal attributes and then use carefully designed metrics to define their importance using their logical axiom counts and more. Such analysis will give rise to a set of temporal requirements of those bio-health ontologies.
We now go on to explain how the temporal attributes are derived and present the definitions of the metrics used to define importance.
Materials & methods
In the following, we distinguish three types of temporal features: (1) Temporal relations are those RO relationships that encode information that is temporally relevant; (2) Temporal attributes are types of temporal information that represent temporal phenomena described by temporal relations, and (3) Temporal annotations are sets of temporal attributes used to annotate a temporal relation with its relevant temporal information. (2) and (3) are defined in detail in the following section.
A temporal requirement corresponds to a temporal annotation. For example, if annotation A is used in an axiom of an ontology, A is said to be a temporal requirement of that ontology. Lastly, a temporal requirement set is a set of temporal requirements, typically one where the temporal requirements are likely to co-occur, defined in more detail in the following.
Overview
The goal of our study of temporal requirements of bio-health ontologies is two-fold. First, we will study the importance of temporal features across OBO Foundry ontologies. Second, we will suggest an empirically validated, ordered list of temporal requirement sets. In order to achieve our goal, we:
Define a set of temporal attributes based on relations from the RO that are used across the OBO Foundry. Match axioms across the OBO Foundry ontologies which exhibit these attributes using a smart matching technique. Analyse the resulting data with respect to the importance of these attributes and their corresponding temporal annotations. Derive a ranked list of temporal requirements based on the importance, coverage and necessity score of temporal annotations across the OBO Foundry corpus.
Defining and identifying temporal attributes
We use the relationships defined in the relation ontology (RO) as a source for defining and extracting temporal attributes. We define temporal attributes as types of temporal information that represent temporal phenomena described by RO relations, such as the past time relation phenomena found in the develops from relation. For each relationship, the temporal information is gathered from its definitions or other annotations, its domain and range constraints, related relationships due to OWL’s precise semantics and in some circumstances general biological knowledge and the way in which ontologies use the relationship when the first three may be lacking.
To illustrate this procedure, recall the RO relationship part of. As well as the annotations (including definitions) presented in Section 2.1, take as well the annotation
“axiom holds for all times”
We performed this temporal attribute derivation procedure for every RO relationship used amongst ontologies in the OBO Foundry. We acquired 56 distinct temporal attributes which we categorised into the following 6 sets: (1)
Domain & range contains the set of all pairings of domain and range constraints that occurred in RO relationships. The set contains 23 attributes involving the four types of continuants continuants (C), independent continuants (IC), specifically dependent continuants (SDC) and generically dependent continuants (GDC), general occurrents (O) and processes (P). Eight of the attributes are between different types of continuants and occurrents (e.g., Domain:C-Range:O or Domain:O-Range:C), 11 are between only continuants (e.g., Domain:IC-Range:IC), two are between only occurrents (e.g., Domain:O-Range:O), one was between any element and a continuant (e.g., Domain:X-Range:C, where X is a place holder any element type) and one was between any two elements of the same kind (e.g., Domain:X-Range:X).
Time contains attributes describing how each relationship relates its entities in time. Due to the fundamental temporal differences between continuants and occurrents, the set can be partitioned into three subsets, those being time attributes of relations between two continuants, two occurrents, or between continuants and occurrents. Overall this set consists of 19 attributes. The continuant time attributes account for seven of these, consisting of Time:same, Time:diff, Time:past, Time:pastImmediate, Time:same/past, Time:future and Time:same/future. Time:same indicates that the domain element of a relationship is related to the range element at the same moment in time. Time:past indicates that the domain element of a relationship is related to the range element present at a past moment in time. Time:pastImmediate indicates that the domain element of a relationship is related to the range present at the previous moment in time. Time:same/past indicates that the domain element of a relationship is related to the range element present at either a previous moment in time or the same moment in time and so on. Time:diff is the opposite of Time:same, indicating that the domain and range element are in different time points. The occurrent time attributes adopt Allen’s time relations on intervals [1]. 13 attributes make up this sub group consisting of Time:before, Time:before/during, Time:beforeInverse, Time:during, Time:during/overlaps, Time:during/overlapsInverse, Time:finishes, Time:finishesInverse, Time:isEqualTo, Time:meets and Time:meetsInverse. Time:before indicates that the domain element of the relationship happens entirely before the range element, where the before is to be interpreted as Allen’s interval relations intends, i.e., the domain ends before the range starts. Time:during/overlaps indicates that the domain element either happens during the range element or overlaps the range element, and so on. Relations between continuants and occurrents are simply a subset of those between continuants. The set consists of the following four attributes: Time:same, Time:same/future, Time:future and Time:same/past, interpreted in the obvious way.
States contain attributes describing possible state changes of the domain or range of a relationship. Six attributes are contained within this category. Domain related attributes include Domain:Birth, Domain:Changed, Domain:Death, and range related attributes include Range:Birth, Range:Changed and Domain:Birth indicates that the relationship specifies the start of the domain element’s existence. Domain:Changed indicates that the domain element goes through some type of change (such as a change in class or other properties) compared to what it was previously. Domain:Death indicates that the relationship specifies the end of the domain elements existence. The same holds for the Range:X attributes in relation to the range elements.
Identity consists of only a single attribute Identity:same which indicates that both the domain and range element of the relationship share the same identity, i.e., they represent the same temporal entity.
Rigid consists of only a single attribute Rigid which indicates that the relationship follows one of a rigid pattern, where both the domain and range elements of the relationship are required to be consecutively related through time for some required duration.
AHFAT consists of only a single attribute AHFAT which indicates that the relationship’s domain element is required to have a relation to a compatible range element at all times (during its existence).
Each attribute may also be paired with a tag Necessary:No which indicates that it is not necessary for the corresponding relationship to hold that particular attribute, although in some scenarios it can. For example, the attribute Rigid-Necessary:No is interpreted as “it is not necessary in all cases for the relation R to be interpreted rigidly, but in some cases, a rigid interpretation holds for R”. An example of when this may be the case is where an ontology specifically describes atemporal information.

Hierarchies of temporal attributes grouped by their category and ordered based upon a subsumption relation. C = continuant, IC = independent continuant, SDC = specifically dependent continuant, GDC = generically dependent continuant, O = occurrent and P = process.
Hierarchical relationships exist between many of the temporal attributes, since some of the attributes imply others in a way that is similar to OWL’s subClassOf relation. For example, Time:past implies Time:diff since a past relation is a relation between two different time points. Figure 2 shows how each attribute type is positioned in its corresponding hierarchy. The Domain & Range attributes are ordered depending on their ontological constraints according to the RO class hierarchy. The remaining attributes are ordered based on their inherent implications.
Temporal attribute examples To further demonstrate the meaning of several temporal attributes, we present examples illustrating their usage. Since we cannot provide examples for all attributes due to space considerations, the attributes chosen for demonstration are a representative set of all attributes and will provide a sufficient level of knowledge to determine the remaining attributes. As with the developmental sequence example from Fig. 1, the examples imagine OWL to have an embedded time-line, where we view a distinct OWL world at every point on the time-line. Also, when necessary, OWL axioms are used to describe examples and are displayed in DL syntax. We begin by describing the different types of entities (continuants and occurrents) before moving onto the temporal attributes of relations.

An independent continuant, persisting through time.
Figure 3 displays an independent continuant persisting through time. It exists alone, without being dependant on another entity, displayed by the fact that no other elements exist in each world. It also maintains its identity throughout time, displayed by having the same element in each world.
Figure 4 shows an example of a specifically dependent continuant SDC, and an independent continuant IC, existing at times t and

A specifically dependent continuant, persisting through time and depending on another continuant for its existence.

A process, having different temporal parts over time whilst occurring in a material entity.
Figure 5 demonstrates a relation between a process and its temporal parts, and their dependency on a continuant for their existence. The main process
where

An independent continuant being derived from another independent continuant at the previous time point.
Figure 6 illustrates the derivesFrom relation. It is defined in RO as: “a relation between two distinct material entities, the new entity and the old entity, in which the new entity begins to exist when the old entity ceases to exist, and the new entity inherits the significant portion of the matter of the old entity”. The relation is tagged with the temporal attributes Domain:IC-Range:IC, Time:Past, Domain:Birth and Range:Death. The first attribute is used since the definition describes both the domain and range of the relation as a material entity, which is a subclass of the Independent Continuant class in RO. The second attribute is used since the relation relates two entities at two separate time points, specifically a present and past time point, which is directional from the former to the latter, hence the usage of the Time:Past attribute. This is displayed in the direction of the derivesFrom arrow in Fig. 6. This information was extracted from the relations definition which implies the domain element exists after the range element ceases to exist, and that the two entities do not exist at the same time and therefore cannot be related at the same point in time. The third and forth attributes were again extracted from the relation’s definition and are used to show that the domain element,

An occurrent that starts with another occurrent.
Figure 7 demonstrates temporal relations between occurrents. The relation startsWith is used where O1 startsWith O2, which would be expressed using the OWL axiom

An independent continuant which is an immediate transformation of another independent continuant.
Figure 8 demonstrates the immediateTransformationOf temporal relation between two independent continuants. The relation is defined as “x immediate transformation of y iff x immediately succeeds y temporally at a time boundary t, and all of the matter present in x at t is present in y at t, and all the matter in y at t is present in x at t” and can be used in OWL as follows:
This relation is annotated with the temporal attributes Dom:IC-Ran:IC, Time:PastImmediate, Identity:Same and Dom:Changed. The first attribute is based on domain and range constraints extracted from RO. The second attribute was extracted from the relation’s definition and indicates that the domain element of the relationship is related to the range present at the previous moment in time. The third attribute was also extracted from the definition and indicates that both the domain and range element are in fact the same entity, derived from the statement that they share exactly the same matter, i.e., the same entity instantiates different classes over time. This is illustrated in Fig. 8 by having the same single element
With the resulting temporal attributes, we developed a coding scheme to then annotate each RO relationship with what we call a temporal annotation which consists of its temporal attributes, defined as follows:
(Temporal Annotation).
Let R be a relation from RO, and a single domain and range attribute 0 or 1 identity attributes a single time attribute 0 or 1 rigidity attribute 1 or more state attributes 0 or 1 AHFAT attributes
To allow for full comparisons of temporal attributes and annotations, we also include the upward closure of attributes for a given annotation according to the temporal attribute hierarchies in Fig. 2, in what we call a temporal inferred annotation, defined as follows:
Let R be a relation from RO with an existing temporal annotation A. Let
The Necessary:No (Nec:No) tags do not necessarily have to appear on the inferred attributes. As an example, the temporal annotation
Although the rules of the OBO Foundry enforce that terms, such as relationships, be used consistently throughout (at least) OBO Foundry ontologies, there are instances where this is not the case. Ideally, to check for a relationship’s usage in an ontology, one should be able to simply search the ontology’s signature for an occurrence of the relationship’s IRI. However, this relies heavily on ontology developers correctly using terms from other vocabularies, i.e. importing vocabularies. This is often not the case since importing ontologies could result in negative side effects such as size increase or a jump in complexity. In the RO case, this matter is immediately realised. Its expressivity is very high due to its complex modelling of relations (role hierarchies, role chains, size, etc) and importing the RO will most likely have a direct negative effect on performance and reasoning time. If not importing the ontology, then at the least the same IRI of any relation used should be adopted in order to indicate the intention that the relationship is the same relationship from RO. Unfortunately, this is not always the case. Instead, developers may (and do) create their own entity with a similar name. For this reason, we cannot simply rely on checking for exactly matching IRIs in an ontology’s signature. Therefore, we adopt a smart matching approach, where we define that a relationship outside RO smartly matches a RO relation if either they share the same IRI, name (rdfs:label), alternative term (IAO_0000118), OBO foundry unique label (IAO_0000589) or the same exact synonym (hasExactSynonym) to avoid any potential misses. These annotation properties were chosen due to the information encoded in each: they are clear, unambiguous in their meaning and ontologies that define their own relationship would be likely to use values from these annotations. Manual inspection of the annotation properties’ values and self-defined relations in the RO confirm this. Exact matches occur when a relationship inside an OBO ontology has the same IRI of a relation from RO (i.e., exact matches refer to the correct usage of RO relations in external ontologies, as specified by the OBO Foundry’s rules).
Usage of temporal features
We present a notion of usage that defines if and how an ontology in OBO uses a temporal attribute, annotation or relationship from the relation ontology.
When considering usage throughout the corpus, we shift our attention towards the terminological aspects of the ontologies in the corpus. That is, we choose to investigate the explicitly asserted terminological knowledge, specifically TBox axioms. Our notion of usage is defined as follows: Let f be a temporal attribute, F a temporal annotation, P an RO relationship, F P α
where
Our goal is to determine the importance of temporal features, i.e., attributes, relations and annotations.
Although temporal relations are annotated with temporal annotations, which are in turn made up of temporal attributes, we choose to initially focus on all three features individually since they all produce different analyses for different audiences. For example, analysing temporal relations could benefit ontology authors as they could determine on a high level, which relations were considered most important, independent of what temporal attributes they are made up of. On the contrary, analysing individual temporal attributes could be useful for logic developers in determining what different types of modelling features are required for a logic, and more importantly, the importance of how attributes co-occur in annotations to determine what combinations are logically possible.
To date, no agreed-upon measure exists to quantify the importance of a particular entity
To quantify the importance of a particular temporal feature, we decided to rely on coverage and axiom usage, which we refer to as impact for brevity. We define both metrics for temporal features as follows:
Let e be either an attribute or annotation and
The coverage measures how many ontologies each feature is used in at least once. The impact describes the percentage of axioms a feature occurs in per ontology (note that we present both metrics as proportions over the whole corpus). Neither measure can perfectly quantify importance alone, therefore, we use both in our analysis where appropriate. In our survey, we will determine the impact and coverage of all temporal relations identified through smart matching, as well as the impact and coverage of their temporal features across the OBO Foundry ontologies. We also define a score to quantify the overall importance of a feature, which takes into account both the coverage and the impact, defined as follows:
Let e be a temporal feature and
The normalisation
Our goal is to produce an ordered list of temporal language requirements based on the results of our survey. We define a temporal requirement set, denoted
(1) Coverage indicates the number of ontologies for which a requirements set is sufficient; it corresponds to the number of ontologies that can be fully expressed if the temporal requirements in
(2) The necessity score corresponds to the number of ontologies that need a particular set of temporal requirements to be met, i.e.
(3) The third metric, mean annotation importance, is the mean importance score (see Definition 5) of all annotations in the requirement set:
To quantify the overall importance of a requirement set, we use the following formula:
The powerset of all possible annotations.
A full account of the analysis (scripts and all results) can be found on
Smart & exact matching
For each ontology, we iterated through each terminological axiom and recorded whether or not the axiom contained an an exact match, or otherwise a smart match of an RO relation. We repeated this for every axiom in every ontology, for every relation in RO.
Out of 140 downloadable ontologies (December 2016) of the OBO Foundry Repository, 11 were not parseable. While 31 ontologies contained no RO relations according to our matching approach, 98 ontologies contained smart matches. It is noteworthy that, if we had relied on exact matches alone, only 68 ontologies would have matched RO relations. This means that we would have underestimated the need for temporal modelling significantly (30% of the OBO Foundry ontologies would have been ignored).
In terms of the axioms the relations are used in, if we were to ignore axioms that only had smart matches, we would be ignoring, again, 30% of all axioms in the OBO Foundry. Of course, it could be the case that all of the smart matches were incorrect matches (they were not meant to simulate RO relations), but we did investigate a reasonably sized random selection of the matches, and it seemed obvious that the relations were matched correctly. For example, some of the matched relations investigated were used in the same way (even temporally) as the way they are defined in the RO. Table 1 shows, for the top 10 elements, by how much the coverage would be underestimated when considering only exact matches.
The top 10 RO relations showing their smart matching and corresponding exact matching metrics in terms of the percentage of ontologies they were matched in. % Diff is the percentage difference between the exact and smart matches
The top 10 RO relations showing their smart matching and corresponding exact matching metrics in terms of the percentage of ontologies they were matched in. % Diff is the percentage difference between the exact and smart matches
The temporal features are categorised based on their domain and range type, and analyses are performed within these categories. This decision was made because each feature contains different combinations of temporal attributes, which cannot be meaningfully evaluated against attributes contained in features with different domain and range types. This way, the analyses are rendered more comprehensible, and comparisons may be drawn against similar temporal phenomena. The domain-range categories used are Continuant-Continuant (CC), Occurrent-Occurrent (OO), Occurrent-Continuant or Continuant-Occurrent (OC-CO) and Other (OT) that includes features that contain the attribute (Domain:X-Range:X). Where appropriate, we use CAT as an abbreviation for domain-range categories.
Temporal relations
We begin by providing a short analysis of temporal relations used across OBO Foundry ontologies. The full tables that display the impact and coverage for every matched relation can be seen in Appendix A. A total of 145 relations were used across the OBO Foundry, of which 98 were CC (68%), 24 were OC-CO (17%), 18 were OO (12%) and 5 were OT (3%).

Distribution of the proportion of axioms with smart matches across ontologies.

Distribution of RO relation usage across ontologies.
Metrics of relations (
Figures 9 and 10 show two histograms illustrating the prevalence and diversity of relations used. Figure 9 shows the distribution of ontologies by smart match prevalence, i.e the proportion of axioms that use at least one RO or RO-like relation compared to the total number of axioms in the ontology. For example, the microRNA ontology (MIRNAO) has 764 axioms, with 79 axioms using at least one of RO(-like) relation, resulting in a proportion of
Figure 10 illustrates the diversity of RO relations as the total number of different RO relations that were used in an ontology. For example, MIRNAO makes use of 8 different RO relations (which is close to the empirical mean of 8.3 different relations per ontology). Only 8 ontologies contain more than 20 different RO relations, and, perhaps apart from UBERON (78) and OVAE (51), even these contain only a fraction of all existing RO relations. This indicates an overall low diversity of RO relations across single ontologies, however, we believe this to be expected: for an ontology to have a high diversity of relations, the domain for which the ontology covers would be considerably large. The majority of ontologies in the OBO Foundry cover specific areas of interest, ignoring the few upper-level ontologies that intend to classify general knowledge. This can explain both the high coverage across the corpus and the comparatively low within-ontology relation diversity.
Top 10 temporal relations ordered by coverage
Top 10 temporal relations ordered by impact
Summary metrics of impact and coverage can be seen in Table 2. Tables 3 and 4 show the top ten relations amongst all categories, ordered by their coverage and impact respectively. As can be seen in Tables 3 and 4, two OT relations have the highest impact and coverage. The remaining top ten relations for coverage and impact are mostly CC relations, with only 3 relations being OC-CO or OO.
As can be seen in Table 2, the average coverage and impact for CC, OO and OC-CO relations are roughly the same, whereas they are considerably higher for OT. The OT category dominates the relation results. This is due to the relation partOf which has both the highest scores by a considerable margin for impact and coverage out of all relations. Its inverse, hasPart also contributes to the high scores of the OT category with relatively high scores, outscoring every relation from any other category. The remaining relations in OT have low scores. Although the CC category has the highest number of used relations (98), only 12 have a coverage above 10 with the remaining relations’ coverage gradually declining towards 1.02 (1 ontology). Only 3 CC relations have impact above 1. OO and OC-CO have similar trends: few relations have relatively high coverage scores with the remaining declining steadily towards 0, and even fewer have notable impact scores. There is an overall strong correlation between coverage and impact for the CC, OC-CO and OT categories each falling above 0.7, whereas the OO correlation was only 0.55.
Top 10 temporal attributes by coverage
Top 10 temporal attributes by impact
Metrics of attributes (
Coverage & impact The coverage and impact of all temporal attributes can be found in Appendix B. Summary metrics of their impact and coverage can be seen in Table 7. The top ten attributes for both coverage and impact can be seen in Tables 5 and 6 respectively. OT attributes followed by CC attributes dominate the top ten scores, with only one other attribute from the OO category appearing in the top ten for either metric. The average coverages and impacts for each category have more variation than in the relation case.
73 attributes were used across all domain and range categories with 31 (42%) belonging to CC, 16 (22%) to OO, 21 (29%) to OC-CO and 5 (7%) to OT. The correlation between coverage and impact for each category is high (
When considering CC attributes, it is clear that the most popular domain and range combinations were those between ICs (domain) and Cs (range). Other combinations are also prominent involving SDCs, whereas relations involving GDCs are less frequent. The Time:Same attribute, which indicates that elements involved in the relation are related at the same time point, has both higher coverage and impact than the Time:Diff attribute, which indicates that the elements are related at different time points (e.g., developsFrom). There is a considerable difference between the two (and for each of Time:diff’s subtypes), although the coverage of Time:diff is not low enough to ignore. Attributes from the
OO relations only differ by their
Only 5 OC-CO attributes have impact over 1, with 3 coming from the
The majority of OT attributes have the highest scores amongst all attributes, which are those that are contained within the annotations for the hasPart and partOf relations. Interestingly, the attribute Rig:Yes is one of the most used attributes, in terms of coverage and impact.
Metrics of annotations (
) in each domain and range category
Metrics of annotations (
The coverage and impact scores of all annotations can be seen in Appendix C, with summary metrics in Table 8. A list of all annotations can be seen in Table 16 (Appendix C). Tables 9 and 10 show the top ten annotations amongst all categories, ordered by their coverage and impact respectively.
The coverage of annotations in each category follows a similar trend: a fraction of the annotations have coverage above 10, with the remainder gradually declining towards the minimum (1.02). Very few annotations have notable impact scores in each category, only 6 annotations have impact over 1 in the CC, OO and OT categories, and none have impact over 1 in OC-CO.
Top 10 temporal annotations by coverage
Top 10 temporal annotations by coverage
Top 10 temporal annotations by impact
Requirement sets are complete sets of temporal annotations that occur in at least one ontology. To quantify the importance of requirement sets, we take a two step approach. First, we compute an overall importance score, introduced in Section 3.7. Second, we compute the Pareto frontier.
Ideally, we would like to order the set of requirements in a way that allows users to understand which are the most relevant. However, if we consider importance, coverage and necessity equally important, there cannot be such an order: there is always a trade-off (if we increase coverage, we often need to decrease necessity). The Pareto frontier is the set of requirements that are Pareto-optimal. A Pareto-optimal requirement is a requirement for which there is no other requirement that has a higher value for one of the three metrics, without at the same time having a lower value for another. This way, the Pareto frontier gives us a natural set of requirements, that as a whole are strictly better than the set of requirements not on the Pareto frontier. Note that this selection of requirements satisfies a user only if they consider all three metrics equally important.
All requirements sets and their importance scores can be seen in Appendix D, in Tables 19 and 20.
The top 15 requirement sets ordered by the their importance (IMP). ON: number of ontologies for which requirement set is necessary. PON: ON as proportion. OC: number of ontologies which are completely covered by requirement set. POC: OC as proportion. MAI: mean importance of annotations in requirement set. IMP: overall importance of requirement set. Shaded in grey or those requirements which are on the Pareto frontier w.r.t. to PON, POC and IMA
The top 15 requirement sets ordered by the their importance (IMP). ON: number of ontologies for which requirement set is necessary. PON: ON as proportion. OC: number of ontologies which are completely covered by requirement set. POC: OC as proportion. MAI: mean importance of annotations in requirement set. IMP: overall importance of requirement set. Shaded in grey or those requirements which are on the Pareto frontier w.r.t. to PON, POC and IMA
75 temporal requirements were identified, of which the top 15 (according to their importance score) can be seen in Table 11. Requirements on the Pareto frontier (12 in total), are shaded in grey (they do not have any requirement sets that are strictly better than them). For example, R49 is not on the Pareto frontier, but ranks eighth according to our importance score. This is because it scores, taking into account all three metrics, strictly worse than R46, while the overall importance score are roughly similar.
The average number of annotations per requirement is 7.733 (
When considering the diversity of annotations within each requirement set, on average, 44.3% of annotations are from the CC category (relations between continuants, e.g., contains), 15.3% from the OO category (relations between occurrents, e.g., precedes), 23.4% from the OC-CO category (relations between occurrents and continuants, e.g., existenceStartsDuring) and 16.3% are from the OT category (e.g., partOf). The annotation that occurs most often is A68, which occurs in 61 out of 75 (81%) requirements and annotates relations such as partOf and hasPart. A68 is the only annotation to occur in R75 – the requirement with the largest necessity, mean annotation importance and overall importance scores. A68 also appears in every requirement on the Pareto frontier.
The diversity of the 12 Pareto optimal requirements is as follows: on average, 41.8% of the requirement sets’ annotations are from the CC category, 14.6% from the OO category, 6.5% from the OC-CO category and 32.9% are from the OT category.
Considering only the top 5 requirement sets, the diversity of annotations along with their attributes is relatively low. Only 5 annotations are used within the top 5 requirement sets made up of only 19 attributes. 4 of the annotations belong to the CC category, 0 to OO, 0 to OC-CO and 1 to OT. 15 of their attributes belong to the CC category and 5 to the OT category. The diversity within each domain category is relatively low. For example, regarding the CC category which contains 15 attributes, 2 of these attributes come from the
This demonstrates the level of coverage needed by a suitable temporal language extension to OWL. Based on all requirement sets, it would not be enough for a language extension to only focus on one type of temporal phenomenon (for example, the modelling of continuants) as the majority of requirements contain more than just one type of domain entity.
However, based on the overall importance scores, it could be argued that the most important requirements, for example, the top 5 requirements, could almost be fully modelled by a language extension that focuses on only one type of temporal entity (continuants), since 90% of the annotations for these requirements only require the modelling of continuants.
To demonstrate the necessary modelling capabilities of a suitable temporal extension
When excluding A68 from
To the best of our knowledge, this is the first study to systematically assess and report on a set of requirements for ontologies in a particular domain. By using a temporally annotated data set that is used widely across the ontology corpus, we were able to determine which individual temporal features in the data-set are most important, as well as their co-occurrence with other temporal features, both in terms of their usage in each ontology, and their coverage.
When considering the individual temporal features, due to the extent of diversity between the features, they were analysed in groups, categorised by their occurrence with the different domain and range features. We found that certain attributes were more prominent in the corpus than others. For example, when considering temporal features belonging to the CC category (those features used in relations whose domain and range type were both continuants), same-time relations were more common than both past-time and future-time relations. Due to the nature of the encoding scheme, we were also able to compare relation categories against each other. OT relations were overall the most prominent amongst the corpus (in terms of coverage and impact), followed by CC relations. OO and OC-CO relations had roughly the same usage.
The analysis of the defined requirements showed that there is high diversity amongst ontologies w.r.t the different categories of temporal phenomena. On average, we found that requirements are made up of just under half of CC attributes, followed by a quarter of OC-CO attributes, and the rest are made up OT and OO attributes. However, when focusing on the Pareto optimal requirements, OT attributes become more prevalent. This is an important result since it shows that in order to meet the requirements, a language would have to be able to model a diverse set of temporal attributes. This may be difficult due to how different the attributes are in nature. For example, being able to model both continuants and occurrents may be difficult, due to how temporally different these entities are.
Amongst all stages of analysis, the relations part of and has part, along with their annotation, attributes and presence in requirements, were considered the most important. These relations were the most used relations, both in terms of coverage and impact. Their attributes and annotation had the highest scores for coverage and impact, and their annotation was used in 81% of all requirements, 100% of the top 15 requirements, and 100% of the requirements on the Pareto frontier. Arguably, the most interesting feature of these relations was the rigid attribute. It is well known that having the ability to model rigidity in temporal logics is a computationally hard problem [15,16], which often leads to undecidability. If this is considered to be one of the most important features, many potential temporal language candidates may be deemed unsuitable.
Although not studied in detail in this paper, the analyses of the data and the definition of the requirements are intended to aid in the identification of a suitable temporal extension of OWL (or its underlying logic) to better aid in the modelling of the temporal features found. We showed that the level of coverage needed for even single requirements was very high. Language designers can use the requirement sets to determine how effective their languages are and to determine how best to extend their language if it is not suitable. They could also be used to drive the development of new language extensions based solely on the requirements found in this study. Languages could also be compared based on how many temporal requirements are met.
Limitations
Although we identified a large amount of temporal features present in the corpus of ontologies, they do not represent an exhaustive set of features. All features used were only derived from the relations used in RO. Ontologies may exhibit other types of temporal phenomena outside of the relation space which was not covered by this survey. For example, the temporal features extracted from the relations did not inform on the type of timeline that was needed to express the feature, such as a linear timeline compared to a branching timeline. Therefore, we can only claim to have defined a subset of the temporal requirements of the ontologies. At the present time, it is not clear how additional data could be extracted in a systematic or automated way, not only due to the size of ontologies and the additional time needed for manual inspection, but also due to there not being another known shared resource such as the Relation Ontology, or the Basic Formal Ontology, allowing data to be easily analysed.
When running our survey, we relied heavily on the notion of smart matching: a way to match relations across terminologies that look similar, but use different IRIs. Although our matching technique was sensible, it is possible that some of the matches may have been incorrect, or other matches may have been missed. Manual inspection of a sample of the matched relations suggested otherwise, however, some matches could still be missed.
Temporal relations, grouped by temporal category and ordered by coverage (COV)
Temporal relations, grouped by temporal category and ordered by coverage (COV)
(Continued)
(Continued)
(Continued)
Before beginning to evaluate temporal language extensions, our next steps include further verification of our requirement results. We hope to achieve this by contacting ontology authors and confirming (1) whether our interpretation of their ontology’s requirements was correct (2) whether our smart matching results were valid, and (3) whether our temporal interpretations of relations coincide with their own interpretations. This would reinforce the validity of our results and possibly make them more fine-grained: determining how relations are intended to be interpreted on an individual ontology level would allow us to eliminate the Necessary attributes (e.g. Rigid:Yes-Necessary:No), which would eliminate uncertainty in the requirements.
The system we created for defining the importance of certain features used throughout ontologies could be used in other application domains to determine importance of entities, not necessarily temporal. We intend to further generalise this procedure and apply it to other application domains to test its efficacy as an entity importance measuring system for ontologies.
Conclusion
Our study produced an empirically validated set of requirements that describe the temporal content of ontologies in the bio-health domain. The results showed that the temporal requirements are diverse and cover a wide range of different phenomena. These results aim to provide a mechanism to show which temporal language extensions are most suitable for the temporal modelling of bio-health ontologies and can also drive the creation of new language extensions, specifically tailored to the requirements and the temporal nature of bio-health ontologies.
