Sage Journals: Discover world-class research

Abstract

Ontology matching is the task of generating a set of correspondences (i.e., an alignment) between the entities of different ontologies. While most efforts on alignment evaluation have been dedicated to the evaluation of simple alignments (i.e., those linking one single entity of a source ontology to one single entity of a target ontology), the emergence of matchers providing complex alignments (i.e., those composed of correspondences involving logical constructors or transformation functions) requires new strategies for addressing the problem of automatically evaluating complex alignments. This paper proposes (i) a benchmark for complex alignment evaluation composed of an automatic evaluation system that relies on queries and instances, and (ii) a dataset about conference organisation. This dataset is composed of populated ontologies and a set of competency questions for alignment as SPARQL queries. State-of-the-art alignments are evaluated and a discussion on the difficulties of the evaluation task is provided.

Keywords

Ontology matching complex alignment evaluation benchmark

1. Introduction

Ontology matching is the task of generating a set of correspondences (i.e., an alignment) between the entities of different ontologies. This is the basis for other tasks, such as data integration, ontology evolution, and query rewriting. While the field has fully developed in the last decades, most works are still dedicated to the generation of simple correspondences (i.e., those linking one single entity of a source ontology to one single entity of a target ontology). However, simple correspondences are insufficient for covering the different kinds of heterogeneities (lexical, semantic, conceptual) in the ontologies to be matched. More expressiveness is achieved by complex correspondences, which can better express the relationships between entities of different ontologies. For example, the piece of knowledge that a conference paper has been accepted can be represented as a class IRI ekaw:Accepted_Paper in a source ontology, or as a class expression representing the papers (the range of cmt:hasDecision is cmt:Paper) having a decision of type cmt:Acceptance in a target ontology. The correspondence $⟨ ekaw: Accepted_Paper, \exists cmt:hasDecision.cmt: Acceptance, \equiv, 1 ⟩$ expresses an equivalence between the two representations of “accepted paper”, with a confidence value of 1.

Earlier works in the field have introduced the need for complex ontology alignments [18,46], and different approaches for generating them have been proposed in the literature afterwards. These approaches rely on diverse methods, such as correspondence patterns [10,28,29], knowledge-rules [17], statistical methods [24,25,47], competency questions for alignment [43,45], genetic programming [22] or still path-finding algorithms [26]. In others fields, such as relational databases, different approaches have been proposed so far [5,12], however, covering less expressive knowledge representation languages and models. The reader can refer to [42] for a survey on complex matching. While works on complex ontology matching have been mostly dedicated to the development of approaches able to generate complex alignments, benchmarks1

¹
Following the definition of “benchmark” as a standard by which something can be measured or judged (from the American Heritage^® Dictionary of the English Language, Fifth Edition. S.v. “benchmark.” Retrieved January 7 2019 from https://www.thefreedictionary.com/benchmark), an alignment benchmark is considered composed of a dataset and an evaluation system.
on which the approaches can be systematically evaluated are still lacking. On the one hand, most existing matching proposals have been manually evaluated [28], usually in terms of precision, or on approach-tailored datasets [47] on which recall is calculated. On the other hand, most efforts on systematic evaluation are still dedicated to matching approaches dealing with simple alignments. Although a large spectrum of matching cases has been proposed so far in the Ontology Alignment Evaluation Initiative campaigns (OAEI)2 ²
http://oaei.ontologymatching.org/
e.g. involving synthetically generated or real world datasets with large and domain-specific ontologies, these cases are mostly limited to simple alignments. Recently, the first OAEI complex track was proposed [40] opening new perspectives for the automatic evaluation in the field.

In this paper, a benchmark for evaluating complex alignments is proposed. This benchmark is composed of a dataset involving ontologies, populated with controlled and shared instances, reference competency question queries, and an automatic evaluation system. “Controlled” or “regularly” populated instances mean that every entity (class or property) concerned by the alignment should have at least one instance in both ontologies. While classical benchmarks in the field [8,48] rely on reference alignments and measurements of compliance between the generated and reference alignments (usually using classical precision and recall as evaluation metrics), here we propose a set of competency questions for alignment (CQA) as reference. A competency question expresses, through a SPARQL query, the knowledge an alignment should cover between the source and target ontologies [37]. In particular, we propose two evaluation measures. While the CQA coverage measure relies on pairs of equivalent SPARQL queries (source and target queries) and measures how well an evaluated alignment covers these queries, the intrinsic precision compares the instances of the correspondences members. Intrinsic precision balances the CQA coverage like precision balances recall in information retrieval.

The contribution of this paper is manifold:

we discuss the challenges of automatic evaluation of complex alignments with respect to classical evaluation in the literature;

we propose an automatic approach for evaluating complex alignments, which is based on competency questions for alignment in the form of SPARQL queries as references, and comparison of instances;

we propose a dataset with controlled instance population and competency questions for alignment on which the alignments are evaluated;

we evaluate state-of-the-art complex alignments on the proposed dataset and discuss their main strengths and weaknesses.

The automatic evaluation system and the populated datasets (and the scripts to generate them) are published under LGPL license.3 ³
https://framagit.org/IRIT_UT2J/conference-dataset-population

The rest of this paper is organised as follows. The background on complex ontology matching and competency question for alignment are introduced in Section 2. Related works are discussed in Section 3. Then, the proposed evaluation system is presented in Section 4. Next, the methodology followed to create the dataset and the dataset itself are detailed in Section 5. Evaluation of existing complex alignments over the benchmark is discussed in Section 6. Finally, conclusions and future work are presented in Section 7.

Fig. 1.
Fragment of the cmt ontology used in the running examples.

Fig. 2.
Fragment of the ekaw ontology used in the running examples.
2. Background

Before introducing the notions of complex alignment and competency questions, the ontologies and their instances that will be used in the rest of this paper are introduced. The ontologies cmt and ekaw come from the Conference dataset [48]. Their fragments are depicted in Figs 1 and 2 using the format proposed in [33].

2.1. Complex ontology alignment

Ontology matching (as in [9]) is defined as the process of generating an alignment A between two ontologies: a source ontology o and a target ontology $o^{'}$ . A is directional, denoted $A_{o \to o^{'}}$ . $A_{o \to o^{'}}$ is a set of correspondences $⟨ e, e^{'}, r, n ⟩$ . Each correspondence expresses a relation r (e.g., equivalence (≡), subsumption (⊒, ⊑)) between two members e and $e^{'}$ , and n expresses the level of confidence $[0..1]$ in this correspondence. A member can be a single ontology entity (class, object property, data property, individual) of respectively o and $o^{'}$ or a more complex construction which is composed of some entities using constructors or transformation functions (as in the examples in the following). From that, two types of correspondences are considered depending on the type of their members [41]:

a correspondence is simple if both e and $e^{'}$ are single entities (represented as IRIs): $⟨ ekaw:Paper, cmt:Paper, \equiv, 1 ⟩$

a correspondence is complex if at least one of e or $e^{'}$ involves a constructor or a transformation function, respectively: $⟨ ekaw:Accepted_Paper, \exists cmt:hasDecision.cmt:Acceptance, \equiv, 1 ⟩$ and $⟨ concatenation(edas:hasFirstName, “ ”, edas:hasLastName), cmt:name, \to, 1 ⟩$

A simple correspondence is usually noted (s:s), and a complex correspondence can be (s:c) if its source member is a single entity, (c:s) if its target member is a single entity or (c:c) if both members are complex entities. An approach which generates a complex alignment will be referred as a “complex matching approach”, a “complex matching system” or a “complex matcher” in the rest of this paper.

2.2. Competency questions for alignment (CQAs)

In ontology authoring, in order to formalise the knowledge needs of an ontology, competency questions (CQs) have been introduced as ontology’s requirements in the form of questions the ontology must be able to answer [11]. As defined in [37,43], a competency question for alignment (CQA) is a competency question which should be covered by two or more ontologies, i.e., it expresses the knowledge that an alignment should cover (if both ontologies’ scopes can answer the CQA). The first difference between CQA and CQ is that the scope of the CQA is limited by the intersection of its source and target ontologies’ scopes. The second difference is that this maximal and ideal alignment’s scope is not known a priori (as it is the purpose of the alignment). As the ontology authoring competency questions (CQs) [27], a CQA can be expressed in natural language or as SPARQL SELECT queries.

Inspired from the predicate arity in [27], the notion of question arity, which represents the arity of the expected answers to a CQA was introduced in [43]:

A unary question expects a set of instances or values, e.g., “Which are the accepted papers?” (paper1), (paper2).

A binary question expects a set of instances or value pairs, e.g., “What is the decision on a paper?” (paper1, accept), (paper2, reject).

A n-ary question expects a tuple of size n, e.g., “What is the decision associated with the review of a given paper?” (paper1, review1, weak accept), (paper1, review2, reject).

3. Related work

Evaluation of matching systems is carried out over an evaluation dataset, usually composed of a set of ontologies, a reference alignment, and potentially different inputs (e.g., queries, instances, partial alignment). The generated alignment is then evaluated by an evaluation system which gives a score to the alignment produced by the system. Different evaluation dimensions can be considered in the process (that applies for both simple and complex evaluation):

Resource-oriented

This dimension refers to the evaluation of the system performance in terms of run-time and memory usage. It is often performed over ontologies of different sizes and levels of expressiveness. Most OAEI tracks adopt this kind of evaluation.

Controlled input

Evaluation of the generated alignment given different (and controlled) inputs. Such an evaluation was proposed for the GeoLink and Hydrography datasets of the OAEI Complex track [40]. Given a list of entities, the system should be able to find the correct (complex) construction involving these entities.

Output-oriented

Evaluation of the output alignment itself over a dataset. This evaluation can be intrinsic or extrinsic. With the former, the quality of an alignment can be measured based on its intrinsic characteristics, as in [21] who evaluates the quality of an alignment over its logical coherence or in [31] where a good alignment should not violate the conservativity principle. With the latter, the evaluation is usually based on the compliance of the generated alignment with respect to a reference one (i.e., applying precision and recall metrics).

Task-oriented

The quality of an alignment can also be assessed regarding its suitability for a specific task or application [15,16]. While current evaluation settings have not been set-up for evaluating matchers specifically designed for a given application or with a given task in mind, alignments generated by general purpose matchers are rather evaluated with respect to its suitability to a given task.

In the following, the main related works considering these evaluation dimensions are discussed.

3.1. Complex alignment evaluation metrics

Most works on alignment evaluation address the evaluation of simple alignments using a reference alignment or a sample of it. This is what has been done in the context of the OAEI campaigns. With respect to the evaluation of complex alignments, they have been evaluated manually, usually in terms of precision [25,28,29,47], or on specific datasets in order to compute recall. In particular, the approach adopted in [25,47] estimated their recall based on a recurring pattern (Class by attribute-value) between DBpedia and Geonames. They estimated the number of occurrences of this pattern between these ontologies and calculated the recall based on this estimation. In [26] a set of reference correspondences between two ontologies was manually created, involving few reference correspondences from which only two could not be expressed with simple correspondences. In [47] the authors proposed an algorithm to create an evaluation dataset that is composed of a synthetic ontology containing 50 classes with Class-by-attribute-value correspondences with DBpedia and 50 classes with no known correspondences with DBpedia. Both ontologies are populated with the same instances. In [34], inspired from [12], the approach for discovering complex attribute correspondences (i.e., {First Name, Last Name} = {Author}) between web interfaces is evaluated using target accuracy (that includes target precision and target recall) as metric. It evaluates how similar the generated alignment is with respect to a set of manually collected ones, using the notion of synonym attribute sets.

As discussed in [43] (inspired from [2]), alternative metrics of accuracy and top-x accuracy have been also applied in evaluation settings in which the number of correspondences is predefined, e.g., there is one correspondence for each entity of the target schema/ontology. The accuracy is calculated as the percentage of predefined questions having a correct answer. A “question” in this context could be a source entity to be matched and the “answers” the correspondences having this entity as source member. Some approaches output various answers for each question, e.g., a ranked list of correspondences for each source entity. In this case the top-x accuracy is the percentage of questions whose correct answer is in the top-x answers to the question. For example, top-3 accuracy is the fraction of source entities for which the correct correspondence is in the three best correspondences generated by the system. Alternatively, the approach in [39], to evaluate complex correspondences between agronomic ontologies is based on manually comparing the results of the reference queries and queries automatically rewritten with the help of the complex alignments.

3.2. Complex alignment benchmarks

As discussed above, complex matchers are usually evaluated on custom evaluation alignment sets, usually covering the specificities of the approach to be evaluated. Recently, the first complex benchmark has been introduced in the OAEI campaigns [40]. The track consists of four datasets from different domains and considering different evaluation strategies:

Complex conference

a consensual complex alignment was created using the query rewriting methodology from [41]. Each generated correspondence is manually classified as true positive or false positive, with respect to a reference alignment. The evaluated and reference correspondences are (s:c). In 2019, the benchmark presented in this paper has been used to automatically evaluate complex alignments.

Hydrography and GeoLink

a set of ontologies on the hydrography domain and a pair of ontologies from GeoScience (more details bout the GeoLink dataset are provided in [49]). The matchers are evaluated following three subtasks: (i) finding all entities which appear in a given correspondence, (ii) finding the right construction involving those entities, and (iii) finding the complex correspondences from scratch. Only the first subtask was implemented in the OAEI 2018 campaign [1], and the evaluation was automatically carried out using classical precision and recall (all alignments were simple equivalences). In 2019, a close metric to relaxed precision and recall [6] has been applied to entity identification and relationship identification tasks.

Taxon

a set of CQAs over agronomic knowledge bases is rewritten with the evaluated alignments. Each rewritten query is manually classified as semantically equivalent to the source query or not. A “Query Well Rewritten” metric measures the percentage of CQA which had a semantically equivalent query after the rewriting process. Each correspondence of the evaluated alignment is also manually classified as true positive or false positive without a reference.

In 2018, only two systems, AMLC [10] and CANARD [44], were able to generate complex correspondences for those datasets. In 2019, a new system has been proposed, AROA4

⁴
http://oaei.ontologymatching.org/2019/results/complex/index.html
.
3.3. Task-oriented benchmarks

Regarding task-oriented evaluation, [9] argued that different task profiles can be established to explicitly compare matching systems for certain tasks, such as ontology evolution or query answering, that have different constraints in terms of coverage and runtime. One such task-oriented evaluation approach was introduced in the OAEI in 2015 at the OA4QA track5

⁵
http://www.cs.ox.ac.uk/isg/projects/Optique/oaei/oa4qa/index.html
[32], which focused on the task of query answering. This track used a synthetically populated version of the Conference dataset and a set of manually constructed queries over these $A boxes$ . A given query, such as Q(x):=Author(x) expressed using the vocabulary of the cmt ontology, is executed over the merged ontology cmt ∪ ekaw ∪ A, where A is an alignment between cmt and ekaw. The evaluation metrics were precision and recall on the result sets of the query evaluation. A reference or model answer set for the query results was computed using the reference alignment (RA1) of the Conference track. The answer set of Q(x) executed over O1 ∪ O2 ∪ A was compared with respect to the result sets of running the same query Q(x) over O1 ∪ O2 ∪ RA1. An alternative approach for evaluating query answering without using instances was proposed by [3], where queries are compared without instance data, by grounding the evaluation on query containment.

In [13], an “end-to-end” evaluation in which a set of queries are rewritten using an evaluated alignment is proposed. The results of the queries are manually classified by relevance for a user on a 6-point scale. This evaluation was performed with two rewriting systems. If a source member e does not appear in any correspondence of the alignment, the upwards rewriting system will use super-classes of e which appear as source member in the alignment’s correspondences and the downwards system will use subclasses of e. Three alignments were evaluated. For each alignment, 20 concepts were randomly selected to be queried and evaluated.

While the task-based evaluation is relevant for both simple and complex alignments, some tasks tend to have higher expressiveness requirements, such as query rewriting and ontology merging, as discussed in [41]. Complex alignments for query rewriting have been the focus of the work of [19],6 ⁶
http://www.music.tuc.gr/projects/sw/sparql-rw/
applied to a few pairs of ontologies. More recently, complex correspondences have been exploited for the task of query rewriting for federating agronomic taxonomy knowledge on the LOD [39] cloud. This dataset is the one used in the OAEI Complex track, on the ability to rewrite SPARQL queries using these alignments. The queries written for the source ontology were rewritten automatically using (s:s) or (s:c) correspondences and the system described by [38], and manually for (c:c) correspondences.

In fact, the query rewriting task can be seen as one of the main applications for complex alignments, and evaluation approaches based on this task are highly relevant. In the case of simple alignments, a naive approach for rewriting SPARQL queries can be to simply replace the IRI of an entity of the initial query by the IRI of the corresponding entity in the alignment, as described in [4]. For complex alignments, such a naive approach is not enough, as the semantics of the alignment itself has to be taken under consideration. [7] proposed an approach for writing specific SPARQL CONSTRUCT queries, but most query rewriting systems still rely on simple or (s:c) complex correspondence and fail in covering highly expressive (c:c) correspondences.
3.4. Positioning with respect to existing benchmarks

With respect to the evaluation of complex alignments, several works focus on manually evaluating alignments, in terms of precision as in [28,29], calculating recall on recurring patterns as in [25,47], or relying on a sample of reference correspondences [26]. While most of these approaches focus on the comparison of correspondences, we shift the problem to the comparison of instances. We propose an evaluation benchmark that considers queries as references and relies on metrics based on query coverage (as for recall) and intrinsic precision (as for precision without a reference alignment). Our approach requires, however, datasets populated in a controlled manner, differently from the datasets in [49].

As [32], we have queries as references instead of reference alignments. Close to ours, the evaluation in [32] relies on a synthetically populated version of the Conference dataset. However, their queries are executed over a merged ontology and alignments are limited to simple correspondences. Here, the queries are executed over different populated ontologies. As [13], here a set of queries are rewritten using an evaluated alignment. However, their evaluation process relies on manually classifying the query results.

Table 1 summarizes the existing alignment evaluation benchmarks that are close to our proposal (CQA benchmark, marked in bold in Table 1). Automation for (c:c) correspondences is still an open issue in the field. The proposal here is to automatise the evaluation process by shifting the problem to the comparison of instances, as detailed in the following sections.

Table 1
Comparison of ontology alignment evaluation benchmarks. The Type of corresp. column represents the form of the most expressive correspondences dealt with by the benchmarks – (c:c) is more complex than (s:c), which is more complex than (s:s)

Benchmark Type of evaluation Type of reference Type of corresp.

OA4QA [32] Automatic (precision/recall) Query (s:s)

Query rewrite [13] Manual Query (s:s)

Patterns evaluation [47] Manual Alignment (s:c)

Patterns evaluation [25] Manual Alignment (s:c)

Thieblin 2018 [41] Manual Alignment (s:c)

GeoLink 2018 [49] Automatic (precision/recall) /Manual Alignment (c:c)

Hydrography 2018 [49] Automatic (precision/recall)/Manual Alignment (c:c)

GeoLink 2019 Automatic (relaxed precision/recall) Alignment (c:c)

Hydrography 2019 Automatic (relaxed precision/recall) Alignment (c:c)

Taxon [39] Manual Query (c:c)

CQA benchmark Automatic instance-based (CQA coverage/intrinsic precision) Query (c:c)

Benchmark	Type of evaluation	Type of reference	Type of corresp.
OA4QA [32]	Automatic (precision/recall)	Query	(s:s)
Query rewrite [13]	Manual	Query	(s:s)
Patterns evaluation [47]	Manual	Alignment	(s:c)
Patterns evaluation [25]	Manual	Alignment	(s:c)
Thieblin 2018 [41]	Manual	Alignment	(s:c)
GeoLink 2018 [49]	Automatic (precision/recall) /Manual	Alignment	(c:c)
Hydrography 2018 [49]	Automatic (precision/recall)/Manual	Alignment	(c:c)
GeoLink 2019	Automatic (relaxed precision/recall)	Alignment	(c:c)
Hydrography 2019	Automatic (relaxed precision/recall)	Alignment	(c:c)
Taxon [39]	Manual	Query	(c:c)
CQA benchmark	Automatic instance-based (CQA coverage/intrinsic precision)	Query	(c:c)

4. Automatic evaluation of complex alignments

As discussed above, evaluation of simple alignments have been largely exploited in the literature and in particular in OAEI campaigns. Automatic evaluation of complex alignments being addressed to a lesser extent [40]. In terms of evaluation metrics, most of the solutions so far are based on the comparison of alignments using syntactic or semantic approaches leaving under-exploited the comparison at instance-level. This is the proposal of this paper.

With respect to a syntactic comparison of alignments, it can measure how much effort should be done to transform an evaluated correspondence into the reference one. However correspondences which use different constructors, or different levels of factorisation can express the same meaning. A syntactic comparison also depends on the language in which the correspondences are expressed. Such a comparison strongly depends on the way the reference correspondences, queries, etc. are expressed. For example, $⟨ o :Author, \exists o^{'} :authorOf. ⊤, \equiv ⟩$ is semantically equivalent to the correspondence ⟨o:Author, ∃ $o^{'}$ :writtenBy⁻.⊤, $\equiv ⟩$ . However, these two correspondences use different URIs in their constructors and thus are syntactically different. The correspondences $⟨ o :AcceptedPaper, \exists o^{'} :acceptedBy. ⊤, \equiv ⟩$ and $⟨ o :AcceptedPaper, ⩾ 1 o^{'} :acceptedBy. ⊤, \equiv ⟩$ are equivalent but expressed using different constructors (respectively an existential restriction or a cardinality restriction over the $o^{'}$ :acceptedBy property). They are also syntactically different. A factorisation problem would consist in verifying that $⟨ o :paper WrittenBy, dom(o^{'} :Paper) ⊓ o^{'} : {writes}^{-}, \equiv ⟩$ and $⟨ o :paperWrittenBy, {(o^{'} :writes ⊓ range(o^{'} :Paper))}^{-}, \equiv ⟩$ are equivalent correspondences. The inverse constructor is factorised in the second correspondence. A syntactic comparison of queries is faced with the same problems: syntactically different SPARQL queries can share the same semantics.

A semantic comparison would be an alternative solution. Semantic precision and recall perform evaluation against references without needing to decide what to compare to what. It is only necessary to evaluate if the reference entails each correspondence of the result (precision) or if the result entails each correspondence of the reference (recall). Concerning complex alignments, it has the advantage that any item that can be converted into OWL can be practically evaluated in this way. However, the expressiveness of the evaluated alignment with a semantic comparison is limited to $SROIQ$ (the decidable fragment of OWL [14]). Correspondences with transformation functions could not be compared with such a comparison. Alternatively, the semantic query comparison proposed by [3] is based on query containment which can be based on inferences. However, it is also limited with regard to queries with transformation functions. While all semantic approaches (as in semantic precision and recall or in query containment) hold for any ontology population, the notion of entailment of constructs involving transformation functions need to be defined and implemented.

An instance-based comparison (of correspondences or query results) is an alternative comparison method to automatize. However, it has the drawback of requiring the knowledge bases to be regularly populated. Hence, for this kind of comparison, the desiderata for instance data is that the ontologies to be matched have ideally to be regularly and consistently populated with common instances, and a complex alignment dataset fulfilling such requirements does not exist. Here, a benchmark to evaluate complex alignments is proposed, including (i) an evaluation system implementing instance-based comparison and using equivalent queries as references and (ii) a dataset with controlled instances. Using equivalent SPARQL CQA as reference would ensure that the two compared objects are equivalent because they model the same piece of knowledge.

With respect to (i), we propose two evaluation measures. While the CQA coverage measure relies on pairs of equivalent SPARQL queries (source and target queries) and measures how well an evaluated alignment covers these queries, the intrinsic precision compares the instances of the correspondences members. Intrinsic precision balances the CQA coverage like precision balances recall in information retrieval. With respect to (ii) a methodology based on CQAs, as introduced in [43], is proposed to synthetically populate ontologies. This methodology was applied to five ontologies of the well-known Conference dataset [48].

In the following, before detailing the CQA coverage metric (Section 4.2), the overall evaluation workflow adopted in the approach is presented (Section 4.1). Then, the description of the intrinsic metric is presented (Section 4.3).

Fig. 3.

Evaluation process of the alignment $A_{eval}$ with a generic $reference$ .

4.1. Overall workflow

Figure 3 presents the overall workflow adopted in the proposed approach. The steps followed in the evaluation process are:

Anchor selection The anchor selection step consists of outputting a pair of comparable objects $⟨ x_{i}, x_{r j} ⟩$ . $x_{i}$ is an object related to the evaluated alignment $A_{eval}$ and $x_{r j}$ is an object related to the reference $reference$ . In the case the reference is equivalent queries, $x_{i}$ can be a query derived from $A_{eval}$ and $x_{r j}$ a reference query.

Comparison The purpose of the comparison step is to output a relation $rel (x_{i}, x_{r j})$ for each pair previously obtained $⟨ x_{i}, x_{r j} ⟩$ . The relation can be an equivalence (i.e., $x_{i} \equiv x_{r j}$ ), a subsumption, an overlap, a disjoint, etc. (this list can be extended according to the type of comparison performed). A similarity value can be associated with the relation. The comparison here is instance-based. $\begin{matrix} (1) & rel (x_{i}, x_{r j}) = \{\begin{matrix} rel (e_{i}, e_{r j}) \\ rel (e_{i}^{'}, e_{r j}^{'}) \\ rel (r_{i}, r_{r j}) \\ rel (n_{i}, n_{r j}) \end{matrix} \end{matrix}$

Scoring The scoring step associates a score with each relation found in the previous step. Thus, the scoring functions are directly impacted by the relation $rel (x_{i}, x_{r j})$ found between the objects. Different scoring metrics have been proposed in the literature (classical score, used in the classical precision and recall metrics, or relaxed precision and recall measures were defined which replace the set intersection by a distance [6]: $\begin{array}{l} (2) & classical score = \{\begin{matrix} 1 & if x_{i} = x_{r j} \\ 0 & otherwise \end{matrix} \\ (3) & relaxed prec score = \{\begin{matrix} 1 & if x_{i} ⩽ x_{r j} \\ 0.5 & if x_{i} > x_{r j} \\ 0 & otherwise \end{matrix} \\ (4) & relaxed rec score = \{\begin{matrix} 1 & if x_{i} ⩾ x_{r j} \\ 0.5 & if x_{i} < x_{r j} \\ 0 & otherwise \end{matrix} \end{array}$

Aggregation The scores are locally and globally aggregated to give the $final score$ . The aggregations can be performed with different functions: best match, average, weighted average, etc. The local aggregation aggregates all scores for a given object. There can be different local aggregations. For example, there can be an aggregation over the evaluated object and one over the reference object. The global aggregation aggregates all the locally-aggregated scores. For example, if the local aggregation was performed over the reference object, all the reference objects were given a score. The reference object scores can be aggregated into a final score. A final score locally aggregated over the evaluated objects is often referred to as the precision score. A final score locally aggregated over the reference objects is often referred to as the recall score.

4.2. CQA coverage metric

With this evaluation strategy, the reference is a set of equivalent CQAs in the form of SPARQL queries. An evaluated alignment $A_{eval}$ will be used to rewrite each source CQA. The rewritten queries will then be compared to the reference target CQA. The comparison of the queries is instance-based and a value is associated with each query relation based on the common part of the evaluated query and target CQA instances. The scoring metric chosen is the one keeping the comparison relation value. A best-match aggregation is locally performed over the reference queries. The locally aggregated scores are then aggregated by an average. In the following, each step of the proposed evaluation process is described.

4.2.1. Source CQA anchoring

As stated above, the reference in this kind of evaluation is a set of equivalent CQAs as SPARQL queries. Each source CQA ${cqa}_{s}$ has an equivalent target CQA ${cqa}_{t}$ . In the anchoring step, each source ${cqa}_{s}$ is rewritten using the generated alignment $A_{eval}$ . The rewriting phase outputs all the possible rewritten target queries from the rewriting systems as the set $Q_{t}$ . For each query $q_{t}$ in $Q_{t}$ , a pair $(q_{t}, {cqa}_{t})$ is formed.

Two rewriting systems have been considered. None of these systems consider the correspondence relation or correspondence value. The first system is the one from [38]. Each triple of ${cqa}_{s}$ is rewritten using $A_{eval}$ . When the predicate or object of the triple appears as the source member of a correspondence in $A_{eval}$ , the target member of this correspondence is transformed into a SPARQL subgraph and put in the triple’s place in the query. This system only deals with (s:c) correspondences. If a triple can be rewritten with different correspondences, all the possible combinations are added into $Q_{t}$ . For example, consider the CQA: which contains ekaw:Accepted_Paper which is the source member of the correspondences $c_{1 k}, k \in [1, \dots, 5]$ .

The rewritten query using the $c_{11}$ correspondence is: This rewriting system cannot however work the other way around. For example, the CQA cannot be rewritten with $c_{11}$ .

The second system is based on instances and has been developed in the context of this paper. The instances $I_{s}^{cqa}$ of ${cqa}_{s}$ are retrieved from the source ontology. For each correspondence c of $A_{eval}$ , the source member is transformed into a query and which retrieves the set of instances $I_{s}$ over the source ontology. If $I_{s} \equiv I_{s}^{cqa}$ , then, the target member of c is transformed into a query and added to $Q_{t}$ . For example the CQA: retrieves a set of accepted paper instances in the ekaw ontology. This set of instances is then compared to the set of instances described by the source member of each correspondence. In this case, ekaw:Accepted_Paper describes the same instances as the source member of all the $c_{1 k}, k \in [1, \dots, 5]$ . Therefore, the target member of each correspondence can be transformed into a query. For $c_{11}$ , the output query is This rewriting system allows queries such as to be rewritten too using the inverse of $c_{11}$ for example (the inverse of a correspondence is its equivalent except that the source member becomes the target member and vice-versa).

Out of the existing rewriting systems dealing with complex correspondences, the one described in [38] deals with the most types of constructions. So far, the proposed instance-based rewriting system is one of the few systems able to deal with (c:c) correspondences. However, it is a feature of the system that (c:c) cannot be combined together.

4.2.2. Comparison

The instances $I_{t}^{cqa}$ of ${cqa}_{t}$ are retrieved over the target ontology. The instances $I_{t}$ of $q_{t}$ are retrieved over the target ontology. $I_{t}$ and $I_{t}^{cqa}$ are compared and the query precision ( $QP$ ) and query recall ( $QR$ ) are associated as value with the relation $rel (q_{t}, {cqa}_{t})$ (subsumption, overlap, equivalence, etc.) between the two queries. $\begin{array}{l} QP = \frac{| I_{t} \cap I_{t}^{cqa} |}{| I_{t} |} QR = \frac{| I_{t} \cap I_{t}^{cqa} |}{| I_{t}^{cqa} |} \\ rel (q_{t}, {cqa}_{t}) = \{\begin{matrix} \equiv if QR = 1 and QP = 1 \\ ⊑ if QR ⩽ 1 and QP = 1 \\ ⊒ if QR = 1 and QP ⩽ 1 \\ overlap if 0 < QR ⩽ 1 and \\ 0 < QP ⩽ 1 \\ ⊥ if QR = 0 and QP = 0 \end{matrix} \end{array}$

4.2.3. Scoring

The relation (associated with the query precision and query recall values) between ${cqa}_{t}$ and $q_{t}$ is transformed by an harmonic mean into a query F-measure score: $\begin{matrix} Fmeasure = 2 \times \frac{QR \times QP}{QR + QP} \end{matrix}$

The query F-measure (equally balancing precision and recall) was preferred over other metrics to be the scoring function as it is commonly used in alignment evaluation to aggregate the results of precision and recall. However, users may prefer one score to another, depending on alignment usage or manipulation. This was an implementation choice, as a matter of facilitating the comparison of the evaluated alignments.

4.2.4. Aggregation

As the rewriting phase outputs all the possible queries regardless of the correspondence relation, a lot of noise can be introduced. Moreover, the same query can be output by both rewriting systems. Therefore, for each ${cqa}_{t}$ , the query $q_{t}$ with the best query F-measure score is kept. The best-match aggregation prevents the final score to suffer from the noise introduced by the query rewriting systems. If a ${cqa}_{s}$ could not be rewritten by the alignment, its query precision, query recall and query F-measure scores are 0.0. Here we make the decision of scoring query precision and recall to 0 if a CQA cannot be rewritten. However, if these have to be evaluated then precision could be set to 1 as long no mistake has been made.

The global aggregation method is the average function. The final output of the evaluation system is an average query precision, query recall and query F-measure score for the evaluated alignment.

4.3. Intrinsic precision

The CQA coverage evaluation locally aggregates the results over the CQA and not the rewritten queries because of the noise added by the rewriting systems. In return, an alignment with all the possible correspondences (correct and erroneous) between the source and target ontologies would obtain a good CQA coverage score. To counterbalance the CQA coverage score, we propose to measure the intrinsic instance-based Precision of an alignment.

For each correspondence $c_{i}$ in the evaluated alignment, the instances $I_{s}$ represented by the source member are compared to the instance $I_{t}$ represented by the target member. Each correspondence is then classified as an equivalent, subsumed, overlapping, or disjoint, given the relation between $I_{s}$ and $I_{t}$ , or empty if $I_{s} = I_{t} = \emptyset$ . Therefore, a correspondence can be empty if both its members are either unsatisfiable entities or non populated entities.

Different precision scores are given for each type of correspondence member relation: the equivalent precision measures the percentage of correspondences whose members are exactly populated with the same instances, the subsumed precision measures the percentage of correspondences whose members subsume one another, the same goes for overlapping and not disjoint which consider correct all correspondences except the disjoint ones.

5. CQA-based dataset

In this section, first the methodology followed to create the evaluation dataset (populated ontologies and associated CQAs) is presented (Section 5.1). Then, the OAEI Conference dataset (Section 5.2) is described, followed by the population of its ontologies from real-life data (Section 5.3). Finally, the set of evaluation CQAs extracted from the CQAs used for the dataset population is discussed (Section 5.4).

5.1. Dataset creation methodology

The purpose here is to create a dataset on which ontology matchers can be run and on which the evaluation described in the previous section can be performed. Therefore, the dataset must contain populated ontologies and a set of CQAs expressed as SPARQL queries over these ontologies. The population step is very important as the chosen instances may influence the result of the evaluation.

The proposed methodology has the following main steps:

Create a set of CQAs based on an application scenario. Only unary and binary CQAs were considered in this work.

Create a pivot format (i.e., the bridge format used for representing in a uniform way the data extracted from the data sources) which covers all the CQAs from step 1.

For each ontology of the dataset, create SPARQL INSERT queries corresponding to the pivot format.

Instantiate the pivot format with real-life or synthetic data.

Populate the ontologies with the instantiated pivot format using the SPARQL INSERT queries.

Run a reasoner to verify the consistency of the populated ontologies. If an inconsistency is detected, try to change the interpretation (i.e., add, suppress or modify axioms) of the ontology and iterate over steps 3 to 5.

Based on SPARQL INSERT queries, translate the CQAs covered by two or more ontologies as SPARQL queries.

In this methodology, the interpretation of the ontologies is the same for ontology population and CQA creation. The creation of CQAs can be done by interviewing users and domain experts, as recommended in the NeOn methodology [35] for competency question authoring. The CQAs can also derive from the competency questions which were used to design the ontologies of the dataset. In this implementation, however, one expert created the CQAs. This set has been discussed with a second expert who judged the set exhaustive enough for covering the conference organisation scenario.

In [41], (c:c) correspondences were not included in the dataset hence no exhaustive coverage could be guaranteed. However, as CQAs represent basic pieces of knowledge, they can be exhaustively covered by an alignment regardless of the shape of the correspondences. Using the same list of CQAs for ontology population and evaluation also insures the consistency of the answers of the evaluation CQAs.

5.2. Conference dataset

The dataset used here is the Conference dataset7

⁷
http://oaei.ontologymatching.org/2018/conference/index.html, http://owl.vse.cz:8080/ontofarm/
proposed in [36]. It has been widely used [48], especially in the OAEI campaigns where it is a reference evaluation track. It is composed of 16 ontologies on the conference organisation domain and simple reference alignments between 7 of these ontologies. These ontologies were developed individually. The motivation for the extension of this dataset is that the ontologies are real ontologies (as opposed to synthetic ones), they are expressive and largely used for evaluation in the field. The query-oriented evaluation benchmark OA4QA was also based on this dataset [32]. Furthermore, reference complex alignments for query rewriting and ontology merging tasks have been proposed over five ontologies of this dataset [41].

In the first OAEI complex track, an evaluation was proposed over a consensual complex alignment between three ontologies (cmt, conference, ekaw) [40]. Here, the five ontologies covered by [41] have been populated: cmt, conference (Sofsem), confOf (confTool), edas and ekaw (Table 2).

Table 2
Number of entities by type of each ontology

cmt conference confOf edas ekaw

Classes 30 60 39 104 74

Obj. prop. 49 46 13 30 33

Data prop. 10 18 23 20 0

Even though this dataset has been largely used, it has only been partially populated. In the OA4QA track, only the classes covered by the 18 queries were populated and the creation of the synthetic $A box$ has not been documented.
5.3. Populating the conference ontologies

	cmt	conference	confOf	edas	ekaw
Classes	30	60	39	104	74
Obj. prop.	49	46	13	30	33
Data prop.	10	18	23	20	0

In order to create the CQAs and re-interpret the Conference ontologies, the conference organisation scenario has been considered. First, the list of CQA has been established by examining a real-life use case: the Extended Semantic Web Conference 2018 edition. Second, the list of CQAs created from this use case has been extended by exploring the conference ontologies scope. The Extended Semantic Web Conference8

⁸
https://2018.eswc-conferences.org/
(ESWC) is open review and its website provided a good base to analyse which information is needed for conference organisation. In order to create the artificial instances of the pivot format, the ESWC 2018 use case as well as data from Scholarly Data [23] were considered.
5.3.1. Re-interpreting the ontologies with real-life data

As mentioned before, the first step of the process was to create a list of CQAs and re-interpret the ontologies under the perspective of a conference organisation application. By analysing the ESWC 2018 website, a first list of CQAs was created. The methodology was followed based on this first list of CQAs. The pivot format was instantiated with the website data.

While running the Hermit [30] reasoner in step 6 of the methodology, several exceptions were encountered. For most of them, the problem was with the interpretation of the ontology. For example, in the cmt ontology, cmt:hasAuthor is functional. Unlike primarily interpreted, this means that cmt:hasAuthor represents a “is first author of” relationship between a cmt:Paper and a cmt:Author. Then, the SPARQL INSERT queries have been modified in order to fit the new interpretation of the ontology.

Two exceptions have been detected, which could not be resolved by a change of interpretation. In that case, the original ontologies have been slightly modified:

cmt: the relation cmt:acceptPaper between an Administrator and a Paper was defined as functional and inverse functional. This leads to an inconsistency when a conference administrator accepts more than one paper. cmt:acceptPaper has been changed to be only inverse functional.

conference: conference:Contribution_1st_author was disjoint with conference:Contribution_co-author, which lead to an inconsistency when a person was at the same time the first author of a paper and the co-author of another paper. The disjunction axiom from the ontology has been then removed.

If a CQA was not exactly covered by an ontology, the ontology would not be populated with its associated instances. This results in an uneven population of equivalent concepts in the ontologies. For example, considering the ekaw and cmt ontologies, which both contain a Document class. “What are the documents?” was not a CQA whereas paper, review, web site and proceedings were the focus of CQAs. While ekaw:Document class has for subclasses ekaw:Paper, ekaw:Review, ekaw:Web_Site and ekaw:Conference_Proceedings, cmt:Document has only two subclasses cmt:Paper and cmt:Review. ekaw:Document will, by consequence of its subclasses, be populated with paper, review, website and proceedings instances whereas cmt:Document will be populated with paper and review instances only.

5.3.2. Conference data analysis

In order to populate the conference ontologies and make it close to real scenarios, some figures from past conferences have been analysed. The information from ISWC 2018 and ESWC 2017 from Scholarly Data9

⁹
http://www.scholarlydata.org/
complemented the ESWC 2018 website data for this analysis. Indeed, some information such as which program committee member reviewed which paper does not appear in Scholarly Data and the ESWC 2018 website did not show which person is affiliated to which organisation. Some points could be observed:
percentage of accepted papers having at least a program committee member as author: 44% for ESWC 2017 and 59% for ISWC 2018

distribution of the number of authors per submitted papers (ESWC 2018): 1 (6%), 2 (17%), 3 (29%), 4 (26%), 5 (9%), 6 (8%) ou 7–10 (2%)

distribution of the number of collaborating institutions per accepted papers over scholarly data (global represents the statistics over all data from the scholarly data endpoint) (see Table 3)

Table 3
Collaborating institutions per accepted papers over scholarly data

nb inst. global ESWC 2017 ISWC 2018

1 56% 40% 40%

2 18% 16% 30%

3 10% 10% 17%

4 6% 7% 7%

5 5% 6% 5%

6+ between 0 and 2%

distribution of the number of authors per accepted papers over scholarly data (see Table 4)

Table 4
Authors per accepted papers over scholarly data

nb auth. global ESWC 2017 ISWC 2018

1 12% 7% 13%

2 21% 11% 14%

3 27% 28% 24%

4 19% 25% 23%

5 17% 17% 14%

6 5% 5% 6%

7+ between 0 and 4%

5.3.3. Population of conference ontologies

nb inst.	global	ESWC 2017	ISWC 2018
1	56%	40%	40%
2	18%	16%	30%
3	10%	10%	17%
4	6%	7%	7%
5	5%	6%	5%
6+	between 0 and 2%

nb auth.	global	ESWC 2017	ISWC 2018
1	12%	7%	13%
2	21%	11%	14%
3	27%	28%	24%
4	19%	25%	23%
5	17%	17%	14%
6	5%	5%	6%
7+	between 0 and 4%

The first population of the ontologies with the ESWC 2018 data left some important knowledge un-represented. For example, the concepts of external reviewer, presenter of a paper, and person affiliation, which appeared important for a conference organisation were not available on the website. Always in the perspective of conference organisation, the conference ontologies were browsed to complete the list of CQAs with useful concepts. The pivot format and associated SPARQL INSERT queries were also extended to cover the new list of CQAs. Then, the next step was to artificially generate the pivot format instantiation. For that, a score between 1 and 10 is given to each conference. This score determines the number of submitted papers, program committee members, etc. as shown in Table 5.

Table 5
Number of submitted papers, pc members, etc. for a conference of size 1 and 10 (min–max values)

Number of Size 1 Size 10

submitted papers 40–45 940–990

people 300–330 1830–2130

pc members 50–52 500–530

oc members 20–22 110–140

sc members 15–17 60–90

institutions 30–32 210–240

tutorials 1–2 10–11

workshops 1–2 19–20

tracks 1 6

Number of	Size 1	Size 10
submitted papers	40–45	940–990
people	300–330	1830–2130
pc members	50–52	500–530
oc members	20–22	110–140
sc members	15–17	60–90
institutions	30–32	210–240
tutorials	1–2	10–11
workshops	1–2	19–20
tracks	1	6

The statistics from the ESWC 2018, ISWC 2018, ESWC 2017 datasets were globally reproduced: 50% of papers have at least a program committee member as author, the number of authors per paper is 1 (6%), 2 (17%), 3 (29%), 4 (26%), 5 (9%), 6 (8%) or 7–10 (2%), the number of collaborating institutions is around 1 (40%), 2(30%), 3 (17%), 4 (7%), 5 (5%) 6(2%). These statistics are pointers, as the generation process is pseudo-random, these figures may vary in practice. Some proportions were arbitrarily chosen: 20% of the submitted papers are poster papers, and 20% are demo papers, the regular paper acceptance rate is in [0.1–0.7] and a poster/demo paper acceptance rate is in [0.4–1.0], 20% of the reviews are done by an external reviewer.

In order to evaluate statistics-based matchers on the benchmark, different sets of population were considered for the ontologies. The idea is to provide the same conference ontologies but with partially overlapping set of instances (instances linked with owl:sameAs). To do so, 6 sets of instance population with a more or less important overlapping parts were created. Each ontology is populated with different conferences10 ¹⁰

A conference here refers to the data related to a conference event.

(with absolutely no common instance between the conferences –no common person, no common paper, etc.). This ensures that there is a quantifiable common part and that the ontologies are consistent. As a result, 6 artificial datasets were created with 25 artificial conferences:

0%: 5 different conferences per ontology

20%: 1 common conference for all ontologies and 4 different conferences per ontology

40%: 2 common and 3 different conferences

60%: 3 common and 2 different conferences

80%: 4 common and 1 different conference

100%: 5 common conferences for all ontologies

Note that the percentage given in the name of the datasets is the percentage of common conference event instances per ontology. As the size of each conference is different, the percentage of common instances (papers, authors, etc.) will not be same. In Table 6, the minimum and maximum percentage of the common paper instances is given for each dataset.

Table 6

Percentage (min, max) of common submitted papers in the different datasets. The second line reads “In the 20% dataset, the proportion of common paper instances is between 7 and 11%”. Which means that for one of the ontologies, the common part of paper instances represents 7% of all its paper instances. For another ontology, the common part of paper instances represents 11% of all its paper instances

Dataset	Min	Max
0%	0%	0%
20%	7%	11%
40%	29%	51%
60%	40%	57%
80%	57%	84%
100%	100%	100%

Not all the ontology concepts were covered by the pivot CQAs. Table 7 shows the number of entities covered by the CQAs, i.e., instantiated after the CQA-based population, in each ontology.

Table 7

Number of populated entities by ontology. Number of populated entities/number of entities in the original ontology

	cmt	conference	confOf	edas	ekaw
Classes	26/30	51/60	29/39	43/104	57/74
Obj. prop.	43/49	37/46	10/13	17/30	26/33
Data prop.	7/10	13/18	10/23	11/20	0/0

5.4. CQA for evaluation creation

For the evaluation, the focus is on CQAs which can actually be covered by two or more ontologies. To write the CQAs which will be used in the dataset, the list of CQAs used for the population was trimmed:

the CQAs which were only covered by one ontology

some CQAs which were not considered relevant such as “What is the name of a reception?”, the answer being an rdfs:label “Reception” for all reception instances.

The remaining CQAs were then written as SPARQL SELECT queries by adapting the SPARQL INSERT queries. Table 8 shows the number of CQAs which were covered by the pivot format, by each ontology (in the SPARQL INSERT queries) and which were transformed into SPARQL SELECT queries for the evaluation dataset. 278 SPARQL SELECT queries result from this process.

Table 8
Number of initial (pivot) CQAs covered by each ontology and number of evaluation (eval) CQAs covered by each ontology

cmt conference confOf edas ekaw total

pivot 46 90 67 60 84 152

eval 34 73 54 52 65 100

	cmt	conference	confOf	edas	ekaw	total
pivot	46	90	67	60	84	152
eval	34	73	54	52	65	100

6. Evaluation

Existing alignments over the conference dataset were evaluated with the proposed evaluation system. The dataset used for the evaluation is the 100% dataset so that instance-based precision can be measured.

6.1. Evaluated alignments

Existing alignments between the Conference ontologies in EDOAL format11

¹¹
http://alignapi.gforge.inria.fr/edoal.html
[4] have been evaluated. The EDOAL format was necessary so that the alignments could be processed by the rewriting systems. Five alignments have been evaluated. The number of ontology pairs (out of 10 pairs) that these alignments cover are indicated in the following.
Query_rewriting
the query rewriting oriented alignment set12 ¹²
https://doi.org/10.6084/m9.figshare.4986368.v7
from [41]. It has been manually generated and is composed of 431 correspondences with 191 complex correspondences from 17 different patterns (some patterns are composite) – 10 pairs of ontologies
Ontology_merging
the ontology merging oriented alignment set12 from [41]. It has been manually generated and is composed of 313 correspondences with 54 complex correspondences from 9 different patterns (some patterns are composite) – 10 pairs of ontologies.
ra1
the reference simple alignment13 ¹³
http://oaei.ontologymatching.org/2018/conference/
from the conference dataset [48]. This dataset is limited to simple alignments between 7 ontologies – 10 pairs of ontologies.
Ritze_2010
the output alignment12 from [29] (automatically generated)–complex correspondences found on 4 pairs of ontologies. This alignment is the smallest one as only one correspondence has been found for each pair.
Faria_2018
the output alignment from [10] (automatically generated) – alignments between 3 pairs publicly available. It is composed of two types of complex equivalence correspondences: those with attribute occurrence restriction and those with attribute domain restriction. These are the alignments available in the context of the OAEI 2018 campaign.14 ¹⁴
http://oaei.ontologymatching.org/2018/results/complex/conference/index.html

The ra1 alignment had been used as input by the systems of Ritze_2010 and Faria_2018. Ra1 has been added to these two alignments for the CQA coverage evaluation. The precision evaluation was made only on the complex correspondences (the output of the original approaches).
6.2. CQA coverage

The CQA coverage evaluation was run over all datasets in order to measure the standard deviation of the query precision, recall and f-measure between the datasets, as shown in Table 9. The standard deviation is maximal for Faria_2018 and Ritze_2010, but is still rather low ( $10^{- 3}$ ). As the standard deviation is low, the CQA coverage evaluation was performed over the 100% dataset so that the same dataset could be used for CQA coverage and instance-based precision evaluation (Table 10). Ritze_2010 and Faria_2018 both have better coverage than ra1 that they include. It means that the complex correspondences in these alignments are indeed a complement to the simple ones.

Table 9
Standard deviation and average of the query precision, query f-measure and query recall scores over the 6 datasets

Query_rewriting Ontology_merging ra1 Faria_2018 Ritze_2010

Standard deviation Precision 1.45 $\times 10^{- 3}$ 1.48 $\times 10^{- 3}$ 6.75 $\times 10^{- 4}$ 2.74 $\times 10^{- 3}$ 1.64 $\times 10^{- 3}$

F-measure 5.55 $\times 10^{- 4}$ 7.95 $\times 10^{- 4}$ 6.87 $\times 10^{- 4}$ 2.65 $\times 10^{- 3}$ 1.76 $\times 10^{- 3}$

Recall 3.89 $\times 10^{- 4}$ 1.17 $\times 10^{- 3}$ 7.26 $\times 10^{- 4}$ 2.63 $\times 10^{- 3}$ 1.91 $\times 10^{- 3}$

Average Precision 0.69 0.63 0.42 0.42 0.48

F-measure 0.68 0.63 0.42 0.41 0.47

Recall 0.70 0.65 0.42 0.41 0.47

		Query_rewriting	Ontology_merging	ra1	Faria_2018	Ritze_2010
Standard deviation	Precision	1.45 $\times 10^{- 3}$	1.48 $\times 10^{- 3}$	6.75 $\times 10^{- 4}$	2.74 $\times 10^{- 3}$	1.64 $\times 10^{- 3}$
F-measure	5.55 $\times 10^{- 4}$	7.95 $\times 10^{- 4}$	6.87 $\times 10^{- 4}$	2.65 $\times 10^{- 3}$	1.76 $\times 10^{- 3}$
Recall	3.89 $\times 10^{- 4}$	1.17 $\times 10^{- 3}$	7.26 $\times 10^{- 4}$	2.63 $\times 10^{- 3}$	1.91 $\times 10^{- 3}$
Average	Precision	0.69	0.63	0.42	0.42	0.48
F-measure	0.68	0.63	0.42	0.41	0.47
Recall	0.70	0.65	0.42	0.41	0.47

Table 10

Average of CQA f-measure for each pair of ontologies for each alignment on the 100% dataset

pair	Query_rewriting	Ontology_merging	ra1	Faria_2018	Ritze_2010
cmt-conference	0.70	0.57	0.31	0.45
cmt-confOf	0.69	0.69	0.69
cmt-edas	0.65	0.65	0.41		0.53
cmt-ekaw	0.65	0.64	0.25	0.42	0.34
conference-cmt	0.69	0.59	0.28	0.41
conference-confOf	0.50	0.48	0.43
conference-edas	0.66	0.52	0.48		0.48
conference-ekaw	0.48	0.45	0.33	0.36
confOf-cmt	0.77	0.71	0.72
confOf-conference	0.73	0.56	0.45
confOf-edas	0.87	0.74	0.28
confOf-ekaw	0.83	0.72	0.51		0.54
edas-cmt	0.73	0.67	0.43		0.54
edas-conference	0.63	0.52	0.50		0.50
edas-confOf	0.56	0.70	0.30
edas-ekaw	0.92	0.83	0.50
ekaw-cmt	0.66	0.65	0.27	0.46	0.36
ekaw-conference	0.51	0.46	0.34	0.38
ekaw-confOf	0.74	0.74	0.45		0.52
ekaw-edas	0.77	0.77	0.50
Average	0.69	0.63	0.42	0.41	0.48

Globally, as shown in Table 10, the Query_rewriting alignments have a better coverage than the others. An exception for the edas-confOf pair could be noted. The Ontology_merging alignment outperforms the Query_rewriting one. This is explained by the choice made in the methodology for the creation of both alignments combined with the rewriting systems. In the Ontology_merging alignments, unions of properties were separated into individual subsumptions which were usable by the rewriting system, whereas in the Query_rewriting one, the subsumptions were unions. For example: Therefore, when a query contained the edas:hasStartDate relation, the Ontology_merging correspondence could be used, but the Query_rewriting ones could not. The precision-oriented methodology prevented the addition of the two Ontology_merging correspondences to the Query_rewriting alignment.

When closely looking at the results, many CQAs retrieving literals (titles, names, etc.) were not rewritten by the alignments. This is mainly explained because the rdfs:label property was introduced in the population phase when no labelling property was included in the original ontologies. The CQAs which needed (c:c) correspondences to be rewritten were not covered by the evaluated alignments. Indeed, these alignments are restricted to (s:s), (s:c) and (c:s) correspondences.

Table 11

Different precision metrics over the alignments. The name of the precision metric is the relation between a correspondence member which is considered correct. For example, in the equivalent precision, the correspondences whose members were found equivalent is considered correct, the other correspondences not correct

Average Precision	Query_rewriting	Ontology_merging	ra1	Faria_2018	Ritze_2010
equivalent	0.42	0.43	0.56	0.65	0.75
subsumed	0.80	0.80	0.83	0.71	0.75
overlapping	0.90	0.86	0.92	0.71	0.75
not disjoint	0.94	0.91	0.96	0.71	0.75

6.3. Intrinsic precision

Table 11 shows the precision of the alignments considering different sets of correspondences as correct. The equivalent precision is calculated by considering that only the correspondences whose members are equivalent are correct. The subsumed precision considers correct the correspondences whose members subsume one another (this includes the equivalent ones). The overlapping precision considers correct the correspondences with equivalent, subsumed or overlapping members. The not disjoint precision considers all correspondences whose members are not disjoint correct. The difference with the overlapping one is that an empty correspondence is correct in this case.

The real precision of the alignments is considered to be between the equivalent and the not disjoint values. The Query_ rewriting, Ontology_merging alignments do not have a very good equivalent precision score (0.42 and 0.43). Indeed, their correspondences include a lot of subsumptions. For the subsumed, overlapping and not disjoint scores, their scores are much higher (0.94 and 0.91). ra1 has a better equivalence score (0.56) than the other two manually created alignments because it originally contains only correspondence with an equivalence relation. However, given this score seems low for a reference alignment. This low score is partly due to the different CQA coverage of the ontologies in the population phase.

For example, for the pair cmt-edas, the ra1 correspondence $⟨ cmt:Document, edas:Document, \equiv, 1.0 ⟩$ is a subsumption in the ontology population. cmt:Document has for subclasses cmt:Paper and cmt:Review, whereas edas:Document has for subclasses edas:Paper, edas:Review, edas:Programme and edas:SlideSet which were all populated. Therefore, even if the correspondence is correct with an equivalence relation, its instance interpretation is a subsumption. Note that the instance interpretation could also be an overlap if cmt had another subclass (e.g., Website) which did not appear in edas.

The low equivalence score of ra1 is also due to the different interpretation of the ontologies. For example, in the pair cmt-confOf, the ra1 correspondence $⟨ cmt:hasAuthor, confOf:writtenBy, \equiv, 1.0 ⟩$ is a subsumption in the ontology population. cmt:hasAuthor was interpreted as the “has 1st author” relationship because of its functional property (Section 5.3.1).

Ritze_2010 has only equivalent or disjoint correspondences, therefore its precision scores are the same for all metrics. Faria_2018 achieves a good precision score overall (between 0.65 and 0.71).

Given the different population issues, the overlapping and not disjoint scores give a good representation of the alignment precision.

Table 12
CQA coverage and equivalence, overlapping and not disjoint precision of the alignments, harmonic mean (HM) of the two scores

Metric Query_rewriting Ontology_merging ra1 Faria_2018 Ritze_2010

CQA Coverage 0.69 0.63 0.42 0.41 0.48

Precision overlapping 0.90 0.86 0.92 0.71 0.75

Precision not disjoint 0.94 0.91 0.96 0.71 0.75

HM overlap 0.78 0.73 0.58 0.52 0.59

HM not disjoint 0.80 0.74 0.58 0.52 0.59

Metric	Query_rewriting	Ontology_merging	ra1	Faria_2018	Ritze_2010
CQA Coverage	0.69	0.63	0.42	0.41	0.48
Precision overlapping	0.90	0.86	0.92	0.71	0.75
Precision not disjoint	0.94	0.91	0.96	0.71	0.75
HM overlap	0.78	0.73	0.58	0.52	0.59
HM not disjoint	0.80	0.74	0.58	0.52	0.59

6.4. Discussion

Table 12 shows the results of the evaluation over the alignments. The CQA coverage and precision scores have been aggregated in an harmonic mean (called HM in Table 12). Overall, the Query_rewriting and Ontology_merging alignments have the better results. This is satisfactory given that these two alignments are complex reference alignments on this dataset. Even if ra1 has the best precision, its low CQA coverage (0.42) shows that a lot of CQAs from the benchmark need complex alignments to be covered. Faria_2018 and Ritze_2010 are compared to the other even if they do not contain the same number of pairs. Therefore, these numbers cannot be exactly compared to the others.

In the results of the OAEI 2018 [1], the precision measured for the Faria_2018 alignment was 0.54 (cf. Table 13). The instance-based precision gives the same result as the manual evaluation for the cmt-ekaw pair. For the other pairs, the gap is quite important. For the cmt-conference pair, this is probably due to a difference of interpretation of the ontologies. The conference:Written_contribution being considered as a superclass of cmt:Paper in the OAEI 2018 evaluation, but equivalent classes in the ontology population.

In the conference-ekaw pair, the $⟨ \exists conference: was_a_track-workshop_chair_of. conference:Tutorial, ekaw:Tutorial_Chair, \equiv, 0.369 ⟩$ was considered correct in the OAEI 2018 evaluation. However, an axiom of the conference ontology restrains the domain of conference:was_a_track-workshop_chair_of to conference:Track ⊔ conference:Workshop. This has been taken into account in the ontology population and the correspondence was evaluated as disjoint for the evaluation system.

Table 13
Comparison of the OAEI 2019 and instance-based precision metrics over the Faria_2018 alignment. The not disjoint, subsumed and overlap precision scores are the same for this alignment

pair OAEI 2018 equivalent not disjoint

cmt-conference 0.4 1.00 1.00

cmt-ekaw 0.86 0.86 0.86

conference-ekaw 0.36 0.09 0.27

Average 0.54 0.65 0.71

pair	OAEI 2018	equivalent	not disjoint
cmt-conference	0.4	1.00	1.00
cmt-ekaw	0.86	0.86	0.86
conference-ekaw	0.36	0.09	0.27
Average	0.54	0.65	0.71

7. Conclusions and future work

This paper has presented an evaluation benchmark on which complex correspondences can be evaluated. In general, alignment evaluation is often performed by comparing a generated alignment to a reference one. It involves comparing the members of the correspondences generated by the systems to the members of the correspondences in the reference alignment. While this comparison is straightforward for simple alignments, this step becomes harder when dealing with complex correspondences. For example, these three correspondences can be considered as true positive: (o:AcceptedPaper,∃ $o^{'}$ :hasDecision. $o^{'}$ :Acceptance,≡), (∃ $o^{'}$ :accepted. ${true}$ ,∃ $o^{″}$ :hasDecision. $o^{″}$ :Acceptance, ≡), or (o:AcceptedPaper,∃ $o^{'}$ :acceptedBy.⊤,≡).

While syntactic-oriented evaluation metrics (measuring the effort to transform a correspondence into another) would fail in covering the high space of possible combinations between constructors, semantic-oriented approaches would restrict the expressiveness of correspondences to those supported by current reasoners, leaving aside for instance, transformation functions. Hence, comparison of instance sets seems to be reasonable. Our proposal shifts the problem to the comparison of instances in a task of query rewriting targeting user needs. We proposed two evaluation measures. While the CQA coverage measure relies on pairs of equivalent SPARQL queries (source and target queries) and measures how well an evaluated alignment covers these queries, the intrinsic precision compares the instances of the correspondences members.

CQA coverage, in particular, requires a way for rewriting the source query into the target query, in terms of the evaluated alignment. Such an evaluation however requires that the ontologies of the evaluation dataset are consistently populated and a system for rewriting the queries. With respect to the former, this problem has been addressed here by proposing an artificially and regularly populated dataset, as datasets with cross-ontology consistency may not be easy to find. The population process was guided by CQAs. We argue that the synthetic population ensures that each CQA is consistently populated across the ontologies. However, one can argue that in case the CQAs have different coverage for correspondences achieved through different patterns, this may have an impact on evaluation. As our evaluation is instance-based, two correspondences that do not exactly follow the same pattern but that represent the same piece of knowledge, will be considered to be comparable.

With respect to the query rewriting systems, most existing SPARQL rewriting systems are limited to (s:c) correspondences and dealing with (c:c) correspondences is still a challenge. A rewriting system which deals with such correspondences has been proposed here. However, it can not combine several (c:c) correspondence together. Instance-based rewriting could, however, be a new lead for this challenge. While the two systems have been manually evaluated in the task of rewriting queries, in the way discussed in [38], we did not evaluate the impact of each of the systems in the evaluation task. While this has to be done, we reduced their potential impact by choosing the best rewriting query, by selecting the one with the best f-measure. Another point is that these systems do not take into account correspondence relation and confidence within the rewrite process, what has to be addressed in the future.

The proposed approach has been applied for evaluating existing alignments. This system has also been applied for automating the evaluation of complex alignments in the OAEI 2019 campaign. The evaluation reported here shows that the reference alignments all have a good precision score and that complex alignments provide a better coverage of the CQAs than simple alignments. The evaluation of the alignments from two complex matchers shows that, even though both achieve a rather good precision, their CQA coverage is below 0.5. However, these results are far from the ones obtained with the original dataset and reported in OAEI campaigns, leaving a large room for improvements in the field. As our approach requires the alignments to be a priori known, it is suitable for scenarios such as the ones in OAEI. In that sense, as for the largely used artificial datasets, as the OAEI Benchmark, our dataset covers a lack of complex datasets under which an automatic evaluation can be carried in a controlled manner.

Evaluating complex ontology alignments, however, is a too broad challenge to be tackled with a single approach, as there are multiple aspects to take into account. A complementary approach to the instance-based one proposed in this paper could be an edit-distance approach that would reflect the effort involved in human validation. The approach should be also scalable, and avoid the need to do all correspondence comparisons. This could also be achieved by considering the possibility of computing minimal complex correspondences (or key complex correspondences, which can be used for computing all the other ones), in line with the work of [20]. In order to cover ontologies of various sizes and domains, developing a query generation system able to automatically generate queries adequate in coverage and scope to the evaluation of complex alignments could also help in the evaluation task.

References

Algergawy ,

Cheatham ,

Faria ,

Ferrara ,

Fundulaki ,

Harrow ,

Hertling ,

Jiménez-Ruiz ,

Karam ,

Khiat ,

Lambrix ,

Li ,

Montanelli ,

Paulheim ,

Pesquita ,

Saveta ,

Schmidt ,

Shvaiko ,

Splendiani ,

É.

Thiéblin ,

Trojahn ,

Vatascinová ,

Zamazal and

Zhou , in: Results of the Ontology Alignment Evaluation Initiative 2018, in: Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference, OM@ISWC 2018, Monterey, CA, USA, October 8, 2018,

Shvaiko ,

Euzenat ,

Jiménez-Ruiz ,

Cheatham and

Hassanzadeh , eds, CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, 2018, pp. 76–116.

An ,

Hu and

Song , Learning to discover complex mappings from web forms to ontologies, in: 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29–November 02, 2012,

Chen ,

Lebanon ,

Wang and

M.J.

Zaki , eds, ACM, 2012, pp. 1253–1262. doi:10.1145/2396761.2398427.

David ,

Euzenat ,

Genevès and

Layaïda , Evaluation of query transformations without data: Short paper, in: Companion of the the Web Conference 2018 on the Web Conference 2018, WWW 2018, Lyon, France, April 23–27, 2018,

Champin ,

Gandon ,

Lalmas and

P.G.

Ipeirotis , eds, ACM, 2018, pp. 1599–1602. doi:10.1145/3184558.3191617.

David ,

Euzenat ,

Scharffe and

Trojahn , in: The Alignment API 4.0, Semantic Web, Vol. 2, 2011, pp. 3–10. doi:10.3233/SW-2011-0028.

Dhamankar ,

Lee ,

Doan ,

Halevy and

Domingos , IMAP: Discovering complex semantic matches between database schemas, in: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, Association for Computing Machinery, New York, NY, USA, 2004, pp. 383–394. ISBN 1581138598. doi:10.1145/1007568.1007612.

Ehrig and

Euzenat , Relaxed precision and recall for ontology matching, in: Integrating Ontologies ’05, Proceedings of the K-CAP 2005 Workshop on Integrating Ontologies, Banff, Canada, October 2, 2005,

Ashpole ,

Ehrig ,

Euzenat and

Stuckenschmidt , eds, CEUR Workshop Proceedings, Vols 156, CEUR-WS.org, 2005.

Euzenat ,

Polleres and

Scharffe , Processing Ontology Alignments with SPARQL, in: Second International Conference on Complex, Intelligent and Software Intensive Systems (CISIS-2008), Technical University of Catalonia, Barcelona, Spain, March 4–7, 2008,

Xhafa and

Barolli , eds, IEEE Computer Society, 2008, pp. 913–917. doi:10.1109/CISIS.2008.126.

Euzenat ,

Rosoiu and

Trojahn , Ontology matching benchmarks: Generation, stability, and discriminability, Journal of Web Semantics21 (2013), 30–48. doi:10.1016/j.websem.2013.05.002.

Euzenat and

Shvaiko , Ontology Matching, 2nd edn, Springer, Berlin Heidelberg, 2013. ISBN 3642387209.

10.

Faria ,

Pesquita ,

B.S.

Balasubramani ,

Tervo ,

Carriço ,

Garrilha ,

F.M.

Couto and

I.F.

Cruz , Results of AML participation in OAEI 2018, in: Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference, OM@ISWC 2018, Monterey, CA, USA, October 8, 2018,

Shvaiko ,

Euzenat ,

Jiménez-Ruiz ,

Cheatham and

Hassanzadeh , eds, CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, 2018, pp. 125–131.

11.

Grüninger and

M.S.

Fox , Methodology for the design and evaluation of ontologies, in: Workshop on Basic Ontological Issues in Knowledge Sharing, Vol. 15, 1995.

12.

He ,

K.C.

Chang and

Han , Discovering complex matchings across web query interfaces: A correlation mining approach, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22–25, 2004,

Kim ,

Kohavi ,

Gehrke and

DuMouchel , eds, ACM, 2004, pp. 148–157. doi:10.1145/1014052.1014071.

13.

Hollink ,

van Assem ,

Wang ,

Isaac and

Schreiber , Two variations on ontology alignment evaluation: Methodological issues, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Proceedings, Spain, June 1–5, 2008,

Bechhofer ,

Hauswirth ,

Hoffmann and

Koubarakis , eds, Lecture Notes in Computer Science, Vol. 5021, Springer, 2008, pp. 388–401. doi:10.1007/978-3-540-68234-9_30.

14.

Horrocks ,

Kutz and

Sattler , The even more irresistible SROIQ, in: Proceedings, Tenth International Conference on Principles of Knowledge Representation and Reasoning, Lake District of the United Kingdom, June 2–5, 2006,

Doherty ,

Mylopoulos and

C.A.

Welty , eds, AAAI Press, 2006, pp. 57–67.

15.

Isaac ,

Kramer ,

van der Meij ,

Wang ,

Schlobach and

Stapel , Vocabulary matching for book indexing suggestion in linked libraries – a prototype implementation and evaluation, in: The Semantic Web – ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Proceedings, Chantilly, VA, USA, October 25–29, 2009,

Bernstein ,

D.R.

Karger ,

Heath ,

Feigenbaum ,

Maynard ,

Motta and

Thirunarayan , eds, Lecture Notes in Computer Science, Vol. 5823, Springer, 2009, pp. 843–859. doi:10.1007/978-3-642-04930-9_53.

16.

Isaac ,

Matthezing ,

van der Meij ,

Schlobach ,

Wang and

Zinn , Putting ontology alignment in context: Usage scenarios, deployment and evaluation in a library case, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Proceedings, Spain, June 1–5, 2008,

Bechhofer ,

Hauswirth ,

Hoffmann and

Koubarakis , eds, Lecture Notes in Computer Science, Vol. 5021, Springer, 2008, pp. 402–417. doi:10.1007/978-3-540-68234-9_31.

17.

Jiang ,

Lowd ,

Kafle and

Dou , Ontology matching with knowledge rules, in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII: Special Issue on Database- and Expert-Systems Applications,

Hameurlain ,

Küng ,

Wagner and

Chen , eds, Springer, Berlin, Heidelberg, 2016, pp. 75–95. doi:10.1007/978-3-662-53455-7_4.

18.

Maedche ,

Motik ,

Silva and

Volz , MAFRA – a MApping FRAmework for distributed ontologies, in: Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web,

Gómez-Pérez and

V.R.

Benjamins , eds, Springer, Berlin, Heidelberg, 2002, pp. 235–250. ISBN 978-3-540-45810-4. doi:10.1007/3-540-45810-7_23.

19.

Makris ,

Bikakis ,

Gioldasis and

Christodoulakis , SPARQL-RW: Transparent query access over mapped RDF data sources, in: 15th International Conference on Extending Database Technology, EDBT ’12, Proceedings, Berlin, Germany, March 27–30, 2012,

E.A.

Rundensteiner ,

Markl ,

Manolescu ,

Amer-Yahia ,

Naumann and

Ari , eds, ACM, 2012, pp. 610–613. doi:10.1145/2247596.2247678.

20.

Maltese ,

Giunchiglia and

Autayeu , Save up to 99% of your time in mapping validation, in: On the Move to Meaningful Internet Systems, OTM 2010 – Confederated International Conferences: CoopIS, IS, DOA and ODBASE, Hersonissos, Proceedings, Part II, Crete, Greece, October 25–29, 2010,

Meersman ,

T.S.

Dillon and

Herrero , eds, Lecture Notes in Computer Science, Vol. 6427, Springer, 2010, pp. 1044–1060. doi:10.1007/978-3-642-16949-6_28.

21.

Meilicke and

Stuckenschmidt , Incoherence as a basis for measuring the quality of ontology mappings, in: Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008) Collocated with the 7th International Semantic Web Conference (ISWC-2008), Karlsruhe, Germany, October 26, 2008,

Shvaiko ,

Euzenat ,

Giunchiglia and

Stuckenschmidt , eds, CEUR Workshop Proceedings, Vol. 431, 2008, CEUR-WS.org.

22.

B.P.

Nunes ,

A.A.M.

Caraballo ,

M.A.

Casanova ,

K.K.

Breitman and

L.A.P.P.

Leme , Complex matching of RDF datatype properties, in: Proceedings of the 6th International Workshop on Ontology Matching, Bonn, Germany, October 24, 2011,

Shvaiko ,

Euzenat ,

Heath ,

Quix ,

Mao and

I.F.

Cruz , eds, CEUR Workshop Proceedings, Vol. 814, CEUR-WS.org, 2011.

23.

A.G.

Nuzzolese ,

A.L.

Gentile ,

Presutti and

Gangemi , Conference linked data: The ScholarlyData project, in: Proceedings, Part II, The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016,

Groth ,

Simperl ,

A.J.G.

Gray ,

Sabou ,

Krötzsch ,

Lécué ,

Flöck and

Gil , eds, Lecture Notes in Computer Science, Vol. 9982, 2016, pp. 150–158. doi:10.1007/978-3-319-46547-0_16.

24.

Parundekar ,

C.A.

Knoblock and

J.L.

Ambite , Linking and building ontologies of linked data, in: The Semantic Web – ISWC 2010,

P.F.

Patel-Schneider ,

Pan ,

Hitzler ,

Mika ,

Zhang ,

J.Z.

Pan ,

Horrocks and

Glimm , eds, Springer, Berlin, Heidelberg, 2010, pp. 598–614. ISBN 978-3-642-17746-0. doi:10.1007/978-3-642-17746-0_38.

25.

Parundekar ,

C.A.

Knoblock and

J.L.

Ambite , Discovering concept coverings in ontologies of linked data sources, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part I, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux ,

Heflin ,

Sirin ,

Tudorache ,

Euzenat ,

Hauswirth ,

J.X.

Parreira ,

Hendler ,

Schreiber ,

Bernstein and

Blomqvist , eds, Lecture Notes in Computer Science, Vol. 7649, Springer, 2012, pp. 427–443. doi:10.1007/978-3-642-35176-1_27.

26.

Qin ,

Dou and

LePendu , Discovering executable semantic mappings between ontologies, in: On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, OTM Confederated International Conferences CoopIS, DOA, ODBASE, GADA, and IS 2007, Proceedings, Part i, Vilamoura, Portugal, November 25–30, 2007,

Meersman and

Tari , eds, Lecture Notes in Computer Science, Vol. 4803, Springer, 2007, pp. 832–849.

27.

Ren ,

Parvizi ,

Mellish ,

J.Z.

Pan ,

van Deemter and

Stevens , Towards competency question-driven ontology authoring, in: The Semantic Web: Trends and Challenges – 11th International Conference, ESWC 2014, Anissaras, Proceedings, Crete, Greece, May 25–29, 2014,

Presutti ,

d’Amato ,

Gandon ,

d’Aquin ,

Staab and

Tordai , eds, Lecture Notes in Computer Science, Vol. 8465, Springer, 2014, pp. 752–767. doi:10.1007/978-3-319-07443-6_50.

28.

Ritze ,

Meilicke ,

Sváb-Zamazal and

Stuckenschmidt , A pattern-based ontology matching approach for detecting complex correspondences, in: Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) Collocated with the 8th International Semantic Web Conference (ISWC-2009), Chantilly, USA, October 25, 2009,

Shvaiko ,

Euzenat ,

Giunchiglia ,

Stuckenschmidt ,

N.F.

Noy and

Rosenthal , eds, CEUR Workshop Proceedings, Vols 551, CEUR-WS.org, 2009.

29.

Ritze ,

Völker ,

Meilicke and

Sváb-Zamazal , Linguistic analysis for complex ontology matching, in: Proceedings of the 5th International Workshop on Ontology Matching (OM-2010), Shanghai, China, November 7, 2010,

Shvaiko ,

Euzenat ,

Giunchiglia ,

Stuckenschmidt ,

Mao and

I.F.

Cruz , eds, CEUR Workshop Proceedings, Vols 689, CEUR-WS.org, 2010.

30.

Shearer ,

Motik and

Horrocks , HermiT: A highly-efficient OWL reasoner, in: Proceedings of the Fifth OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference (ISWC-2008), Karlsruhe, Germany, October 26–27, 2008,

Dolbear ,

Ruttenberg and

Sattler , eds, CEUR Workshop Proceedings, Vols 432, CEUR-WS.org, 2008.

31.

Solimando ,

Jiménez-Ruiz and

Guerrini , Detecting and correcting conservativity principle violations in ontology-to-ontology mappings, in: Proceedings, Part II, The Semantic Web – ISWC 2014–13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014,

Mika ,

Tudorache ,

Bernstein ,

Welty ,

C.A.

Knoblock ,

Vrandecic ,

Groth ,

N.F.

Noy ,

Janowicz and

C.A.

Goble , eds, Lecture Notes in Computer Science, Vol. 8797, Springer, 2014, pp. 1–16. doi:10.1007/978-3-319-11915-1_1.

32.

Solimando ,

Jiménez-Ruiz and

Pinkel , Evaluating ontology alignment systems in query answering tasks, in: Proceedings of the ISWC 2014 Posters & Demonstrations Track a Track Within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014,

Horridge ,

Rospocher and

van Ossenbruggen , eds, CEUR Workshop Proceedings, Vol. 1272, CEUR-WS.org, 2014, pp. 301–304.

33.

Stapleton ,

Howse ,

Bonnington and

Burton , A vision for diagrammatic ontology engineering, in: Proceedings of the International Workshop on Visualizations and User Interfaces for Knowledge Engineering and Linked Data Analytics Co-Located with 19th International Conference on Knowledge Engineering and Knowledge Management, VISUAL@EKAW 2014, Linköping, Sweden, November 24, 2014,

Ivanova ,

Kauppinen ,

Lohmann ,

Mazumdar ,

Pesquita and

Xu , eds, CEUR Workshop Proceedings, Vol. 1299, CEUR-WS.org, 2014, pp. 1–13.

34.

Su ,

Wang and

F.H.

Lochovsky , Holistic schema matching for web query interfaces, in: Advances in Database Technology – EDBT 2006, 10th International Conference on Extending Database Technology, Proceedings, Munich, Germany, March 26–31, 2006,

Y.E.

Ioannidis ,

M.H.

Scholl ,

J.W.

Schmidt ,

Matthes ,

Hatzopoulos ,

Böhm ,

Kemper ,

Grust and

Böhm , eds, Lecture Notes in Computer Science, Vol. 3896, Springer, 2006, pp. 77–94. doi:10.1007/11687238_8.

35.

M.C.

Suárez-Figueroa ,

Gómez-Pérez and

Fernández-López , The NeOn methodology for ontology engineering, in: Ontology Engineering in a Networked World,

M.C.

Suárez-Figueroa ,

Gómez-Pérez ,

Motta and

Gangemi , eds, Springer, 2012, pp. 9–34. doi:10.1007/978-3-642-24794-1_2.

36.

Šváb ,

Svátek ,

Berka ,

Rak and

Tomášek , Ontofarm: Towards an experimental collection of parallel ontologies, in: Proceedings of the 4th International Semantic Web Conference (ISWC), Poster, 2005.

37.

É.

Thiéblin , Do competency questions for alignment help fostering complex correspondences? in: Proceedings of the EKAW Doctoral Consortium 2018 co-located with the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018), Nancy, France, November 13, 2018,

Hollink and

Osborne , eds, CEUR Workshop Proceedings, Vol. 2306, CEUR-WS.org, 2018.

38.

É.

Thiéblin ,

Amarger ,

Haemmerlé ,

Hernandez and

C.T.

dos Santos , Rewriting SELECT SPARQL queries from 1: n complex correspondences, in: Proceedings of the 11th International Workshop on Ontology Matching co-located with the 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 18, 2016,

Shvaiko ,

Euzenat ,

Jiménez-Ruiz ,

Cheatham ,

Hassanzadeh and

Ichise , eds, CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, 2016, pp. 49–60.

39.

É.

Thiéblin ,

Amarger ,

Hernandez ,

Roussey and

C.T.

dos Santos , Cross-querying LOD datasets using complex alignments: An application to agronomic taxa, in: Metadata and Semantic Research – 11th International Conference, MTSR 2017, Proceedings, Tallinn, Estonia, November 28–December 1, 2017,

Garoufallou ,

Virkus ,

Siatri and

Koutsomiha , eds, Communications in Computer and Information Science, Vol. 755, Springer, 2017, pp. 25–37. doi:10.1007/978-3-319-70863-8_3.

40.

É.

Thiéblin ,

Cheatham ,

C.T.

dos Santos ,

Zamazal and

Zhou , The first version of the OAEI complex alignment benchmark, in: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8–12, 2018,

van Erp ,

Atre ,

López ,

Srinivas and

Fortuna , eds, CEUR Workshop Proceedings, Vol. 2180, CEUR-WS.org, 2018.

41.

É.

Thiéblin ,

Haemmerlé ,

Hernandez and

Trojahn , Task-oriented complex ontology alignment: Two alignment evaluation sets, in: The Semantic Web – 15th International Conference, ESWC 2018, Heraklion, Proceedings, Crete, Greece, June 3–7, 2018,

Gangemi ,

Navigli ,

Vidal ,

Hitzler ,

Troncy ,

Hollink ,

Tordai and

Alam , eds, Lecture Notes in Computer Science, Vol. 10843, Springer, 2018, pp. 655–670. doi:10.1007/978-3-319-93417-4_42.

42.

É.

Thiéblin ,

Haemmerlé ,

Hernandez and

Trojahn , Survey on complex ontology matching, Semantic Web11(4) (2020), 689–727. doi:10.3233/SW-190366.

43.

É.

Thiéblin ,

Haemmerlé and

Trojahn , Complex matching based on competency questions for alignment: A first sketch, in: Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference, OM@ISWC 2018, Monterey, CA, USA, October 8, 2018,

Shvaiko ,

Euzenat ,

Jiménez-Ruiz ,

Cheatham and

Hassanzadeh , eds, CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, 2018, pp. 66–70.

44.

É.

Thiéblin ,

Haemmerlé and

Trojahn , CANARD complex matching system: Results of the 2018 OAEI evaluation campaign, in: Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference, OM@ISWC 2018, Monterey, CA, USA, October 8, 2018,

Shvaiko ,

Euzenat ,

Jiménez-Ruiz ,

Cheatham and

Hassanzadeh , eds, CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, 2018, pp. 138–143.

45.

É.

Thiéblin ,

Haemmerlé and

Trojahn , Generating expressive correspondences: An approach based on user knowledge needs and A-box relation discovery, in: The Semantic Web – ISWC 2020 – 19th International Semantic Web Conference, 2020, Proceedings, Part I, Athens, Greece, November 2–6,

J.Z.

Pan ,

V.A.M.

Tamma ,

d’Amato ,

Janowicz ,

Fu ,

Polleres ,

Seneviratne and

Kagal , eds, Lecture Notes in Computer Science, Vol. 12506, Springer, 2020, pp. 565–583. doi:10.1007/978-3-030-62419-4_32.

46.

P.R.S.

Visser ,

D.M.

Jones ,

B.T.J.M.

Capon and

M.J.R.

Shave , An analysis of ontological mismatches: Heterogeneity versus interoperability, in: AAAI 1997 Spring Symposium on Ontological Engineering, Stanford, USA, 1997, pp. 164–172.

47.

Walshe ,

Brennan and

O’Sullivan , Bayes-ReCCE: A Bayesian model for detecting restriction class correspondences in linked open data knowledge, Bases12(2) (2016), 25–52.

48.

Zamazal and

Svátek , The ten-year OntoFarm and its fertilization within the onto-sphere, Journal of Web Semantics43 (2017), 46–53. doi:10.1016/j.websem.2017.01.001.

49.

Zhou ,

Cheatham ,

Krisnadhi and

Hitzler , A complex alignment benchmark: GeoLink dataset, in: Proceedings, Part II, The Semantic Web – ISWC 2018–17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018,

Vrandecic ,

Bontcheva ,

M.C.

Suárez-Figueroa ,

Presutti ,

Celino ,

Sabou ,

Kaffee and

Simperl , eds, Lecture Notes in Computer Science, Vol. 11137, Springer, 2018, pp. 273–288. doi:10.1007/978-3-030-00668-6_17.

Automatic evaluation of complex alignments: An instance-based approach

Abstract

Keywords

1. Introduction

2.1. Complex ontology alignment

2.2. Competency questions for alignment (CQAs)

3. Related work

3.1. Complex alignment evaluation metrics

3.2. Complex alignment benchmarks

4 http://oaei.ontologymatching.org/2019/results/complex/index.html . 3.3. Task-oriented benchmarks

4.2. CQA coverage metric

4.2.1. Source CQA anchoring

4.2.2. Comparison

4.2.3. Scoring

4.2.4. Aggregation

4.3. Intrinsic precision

5. CQA-based dataset

5.1. Dataset creation methodology

5.2. Conference dataset

5.3.2. Conference data analysis

Table 8 Number of initial (pivot) CQAs covered by each ontology and number of evaluation (eval) CQAs covered by each ontology cmt conference confOf edas ekaw total pivot 46 90 67 60 84 152 eval 34 73 54 52 65 100

6.1. Evaluated alignments

References

⁴
http://oaei.ontologymatching.org/2019/results/complex/index.html
.
3.3. Task-oriented benchmarks

Table 8
Number of initial (pivot) CQAs covered by each ontology and number of evaluation (eval) CQAs covered by each ontology

cmt conference confOf edas ekaw total

pivot 46 90 67 60 84 152

eval 34 73 54 52 65 100