We address the issue of Ontology-Based Data Access, with ontologies represented in the framework of existential rules, also known as Datalog±. A well-known approach involves rewriting the query using ontological knowledge. We focus here on the basic rewriting technique which consists of rewriting the initial query into a union of conjunctive queries. First, we study a generic breadth-first rewriting algorithm, which takes any rewriting operator as a parameter, and define properties of rewriting operators that ensure the correctness of the algorithm. Then, we focus on piece-unifiers, which provide a rewriting operator with the desired properties. Finally, we propose an implementation of this framework and report some experiments.
We address the issue of Ontology-Based Data Access, which aims at exploiting knowledge expressed in ontologies while querying data. In this paper, ontologies are represented in the framework of existential rules [2,22], also known as Datalog± [4,5]. Existential rules allow one to assert the existence of new unknown individuals, which is a key feature in an open-world perspective, as for instance in incomplete databases [8]. These rules are of the form , where the body and the head are conjunctions of atoms (without functions) and variables that occur only in the head are existentially quantified. They generalize lightweight description logics (DLs), which form the core of the tractable profiles of OWL2 [34].
Forward/backward chaining.
The general query answering problem can be expressed as follows: given a knowledge base (KB) composed of a set of facts—or data—and an ontology (a set of existential rules here), and a query Q, compute the set of answers to Q in . In this paper, we consider Boolean conjunctive queries (Boolean CQs or BCQs). Note however that all our results are easily extended to non-Boolean conjunctive queries as well as to unions of conjunctive queries. The fundamental problem, called BCQ entailment hereafter, can be recast as follows: given a KB , composed of facts and existential rules, and a Boolean conjunctive query Q, is Q entailed by ?
BCQ entailment is undecidable for general existential rules (e.g., [3,10], on the implication problem for tuple-generating dependencies, which have the same form as existential rules). There has been an intense research effort aimed at finding decidable subsets of rules that provide good tradeoffs between expressivity and complexity of query answering (see e.g., [25] for a survey on decidable classes of rules). These decidable rule fragments overcome some of the limitations of DLs. In particular, they have unrestricted predicate arity, while DLs consider unary and binary predicates only, which allows one for a natural coupling with database schemas, in which relations may have any arity; moreover, adding information, such as data provenance, is made easier by the unrestricted predicate arity, since this information can be added as a new predicate argument.
There are two main approaches to solve BCQ entailment, which are linked to the classical paradigms for processing rules, namely forward and backward chaining, as illustrated by the next example.
Let us consider data on movies, with unary relations and , and a binary relation (intuitively, means that “x plays a role in y”). Let Q be a query asking if a given person, whose identifier is B, plays a role somewhere, i.e., . Let R be an existential rule expressing that “every actor plays a role in some movie”, i.e., . Assume that the data contain . If Q is asked on these data, the answer is no. However, the rule allows to infer that actor B plays in a movie, thus the answer to Q should be yes. Rule R can be used in a forward manner, i.e., it can be applied to the data: then, the knowledge is added, where is a new variable. Query Q can be mapped to the enriched data, which allows to answer positively. Now, R can also be used in a backward manner, i.e., to rewrite Q, which yields the new query . This query can be mapped to the (initial) data, which provides the positive answer.
Both approaches can be seen as ways of reducing the problem to a classical database query answering problem by eliminating the rules, see Fig. 1. The first approach consists in applying the rules to the data, thus materializing entailed facts into the data. Then, Q is entailed by if and only if it can be mapped to this materialized database. This approach is applicable either when the forward chaining procedure stops “naturally” (see [14] for a survey on these cases), or when it stops by taking some parameters into consideration, typically the size of the query [4,23]. The second approach consists in using the rules to rewrite the query into a first-order query (typically a union of conjunctive queries [9,15,27,28,33]) or a non-recursive Datalog program [16,26,29]. Then, Q is entailed by if and only if the rewritten query is entailed by the initial database.
Materialization has the advantage of enabling efficient query answering but may be not appropriate for data size, data access rights or data maintenance reasons. Query rewriting has the advantage of avoiding changes in the data, however its drawback is that the rewritten query may be large, even exponential in the size of initial query, hence less efficiently processed, at least with current database techniques. Finally, techniques combining both approaches have been developed, in particular so-called combined approach [21,24] for lightweight description logics, as well as a similar algorithm for a large class of existential rules [32].
In this paper, we focus on rewriting techniques, and more specifically on rewriting the initial conjunctive query Q into a union of conjunctive queries, that we will see as a set of conjunctive queries. This set is called a rewriting set of Q and each element of a rewriting set is called a rewriting. While most previously cited work focuses on specific rule sublanguages (mostly DL-Lite, linear and sticky existential rules), we consider general existential rules. This means that our algorithm does not make any syntactic assumption on the input set of rules, but will terminate only in some cases (so-called finite unification sets of rules, see hereafter).
The goal is to compute a rewriting set both sound (if one of its elements maps to the initial database, then entails Q) and complete (if entails Q then there is an element that maps to the initial database). Minimality may also be a desirable property. In particular, let us consider the generalization relation (a preorder) induced on Boolean conjunctive queries by homomorphism: we say that is more general than if there is a homomorphism from to ; it is well-known that the existence of such a homomorphism is equivalent to the following property: for any set of facts F, if theanswer to in F is positive, then so is the answer to . We point out that any sound and complete rewriting set of a query Q remains sound and complete when it is restricted to its most general elements. Since BCQ entailment is undecidable, there is no guarantee that such a finite set exists for a given query and general existential rules. A set of existential rules ensuring that a finite sound and complete set of most general rewritings exists for any query is called a finite unification set (fus) [2]. The fus property is not recognizable [2], but several easily recognizable fus classes have been exhibited in the literature: atomic-body rules [1], also known as linear TGDs [5], multi-linear TGDs [6], sticky(-join) rules [7,15], weakly-recursive rules [13] and sets of rules with an acyclic graph of rule dependencies [1]. By definition, the fus property is a specific case of first-order rewritability, which means that the set of rules allows to rewrite any CQ into a (sound and complete) first-order query; it is suspected that both properties are actually equivalent, however, to the best of our knowledge, no proof of this result has been published.
Paper contributions We start from a generic algorithm which, given a BCQ and a set of existential rules, computes a rewriting set. This task can be recast in terms of exploring a potentially infinite space of queries, composed of the initial conjunctive query and its (sound) rewritings, structured by the generalization preorder. The algorithm explores this space in a breadth-first way, with the aim of computing a complete rewriting set. It maintains a rewriting set and iteratively performs the following tasks: (1) generate all the one-step rewritings from unexplored queries in ; (2) add these rewritings to and update in order to keep only incomparable most general elements. A rewriting operator is a function that, given a query and a set of rules, returns the one-step rewritings of this query. Note that it may be the case that the set of sound rewritings of the query is infinite while the set of its most general sound rewritings is finite. It follows that a simple breadth-first exploration of the rewriting space is not sufficient to ensure finiteness of the process, even for fus rules; one also has to maintain a set of the most general rewritings. This algorithm is generic in the sense that it is not restricted to a particular kind of existential rules nor to a specific rewriting operator (without guarantee of termination though).
This algorithmic scheme established, we then asked ourselves the following questions:
Assuming that the algorithm outputs a finite sound and complete set rewritings, composed of pairwise incomparable queries, is this set of minimal cardinality, in the sense that no sound and complete rewriting set produced by any other algorithm can be strictly smaller?
At each step of the algorithm, some queries are discarded, because they are more specific than other rewritings, even if they have not been explored yet. The question is whether this dynamic pruning of the search space keeps the completeness of the output. More generally, which properties have to be fulfilled by the operator to ensure the correctness of the algorithm and its termination for fus rules?
Finally, design a rewriting operator that fulfills the desired properties and leads to the effective computation of a sound and complete rewriting set.
With respect to the first question, we show that all sound and complete sets of rewritings, restricted to their most general elements, have the same cardinality, which is minimal with respect to the completeness property. Moreover, if we delete redundant atoms from the obtained CQs (which can be performed by a polynomial number of homomorphism tests for each query),1
See e.g. [11], Section 2.6, on basic conceptual graphs. The algorithm can even by made linear, noticing that an atom needs to be considered only once.
then we obtain a unique minimal sound and complete set of CQs of minimal size; uniqueness is of course up to a bijective variable renaming.
To answer the second question, we define several properties that a rewriting operator has to satisfy and show that these properties actually ensure the correctness of the algorithm and its termination for fus rules. In particular, we point out that the fact that a query may be removed from the rewriting set before being explored may prevent the completeness of the output, even if the rewriting operator is theoretically able to generate a complete output. The prunability of the rewriting operator ensures that this dynamic pruning can be safely performed. Briefly, this property holds if, for all queries and , when is more general than then any one-step rewriting of is less general than itself or one of the one-step rewritings of ; intuitively, this allows to discard the rewriting even when its one-step rewritings have not been generated yet. Note that this kind of properties ties in with an issue raised in [18] about the gap between theoretical completeness of some methods and the effective completeness of their implementation, this gap being mainly due to algorithmic optimizations (here the dynamic pruning).
Concerning the third question, we proceed in several steps. First, we rely on a specific unifier, called a piece-unifier, that was designed for backward chaining with conceptual graph rules (whose logical translation is exactly existential rules [30]). As in classical backward chaining, the rewriting process is based on a unification operation between the current query and a rule head. However, existential variables in rule heads induce a structure that has to be considered to keep soundness. Thus, instead of unifying a single atom of the query at once, our unifier processes a subset of atoms from the query. A piece is a minimal subset of atoms from the query that have to be erased together, hence the name piece-unifier. We present below a very simple example of piece unification (in particular, the head of the existential rule is restricted to a single atom).
Let and the BCQ . Assume we want to unify the atom from Q with , for instance by a substitution .2
A substitution is given as a set of pairs, where a pair means that x is substituted by e.
Since v is unified with the existential variable y, all other atoms from Q containing v must also be considered: indeed, simply rewriting Q into would be unsound: intuitively, the fact that the atoms and in Q share a variable would be lost in atoms and ; for instance would answer despite Q being not entailed by F and R. Thus, and have to be both unified with the head of R, for instance by means of the following substitution: . is called a piece. The corresponding rewriting of Q is .
Piece-unifiers lead to a logically sound and complete rewriting method. As far as we know, it is the only method accepting any kind of existential rules, while staying in this fragment, i.e., without Skolemization of rule heads to replace existential variables with Skolem functions.
We show that the piece-based rewriting operator fulfills the desired properties ensuring the correctness of the generic algorithm, and its termination in the case of fus rules.
The next question was how to optimize the rewriting step. Indeed, the problem of deciding whether there is a piece-unifier between a query and a rule head is NP-complete and the number of piece-unifiers can be exponential in the size of the query. To cope with these sources of complexity, we consider so-called single-piece unifiers, which unify a single-piece of the query at once (like μ in Example 2). When, additionally, the head of a rule R is restricted to an atom, which is a frequent case, each atom in a query Q belongs to at most one piece with respect to R; then, the number of (most general) single-piece unifiers of Q with the head of R is bounded by the size of Q.
We show that the single-piece based rewriting operator is able to generate a sound and complete rewriting set. However, as pointed out in several examples, it is not prunable. Hence, single-piece unifiers have to be combined to recover prunability. We thus define the aggregation of single-piece unifiers and show that the corresponding rewriting operator fulfills all desired properties and generates fewer queries than the piece-based rewriting operator. Detailed algorithms are given and first experiments are reported.
Paper organization Section 2 recalls some basic notions about the existential rule framework. Section 3 defines sound, complete and minimal sets of rewritings. In Section 4, the generic breadth-first algorithm is introduced and general properties of rewriting operators are studied. Section 5 presents the piece-based rewriting operator. In Section 6, we focus on exploiting single-piece unifiers and introduce the rewriting operator based on their aggregation. Finally, Section 7 is devoted to detailed algorithms and experiments, as well as to further work.
This is an extended version of papers by the same authors published at RR 2012 and RR 2013 (International Conference on Web Reasoning and Rule Systems).
Preliminaries
An atom is of the form where p is a predicate with arity k, and the are terms, i.e., variables or constants. Given an atom or a set of atoms A, , and denote its sets of variables, constants and terms, respectively. In all the examples in this paper, the terms are variables (denoted by x, y, z, etc.). ⊨ denotes the classical logical consequence. Two formulas and are said to be equivalent if and .
A fact is an existentially closed conjunction of atoms.3
We generalize the classical notion of a fact in order to take existential variables into account.
A conjunctive query (CQ) is an existentially quantified conjunction of atoms. When it is a closed formula, it is called a Boolean CQ (BCQ). Hence, facts and BCQs have the same logical form. In the following, we will see them as sets of atoms. A union of conjunctive queries (UCQ) is a disjunction of CQs, which we will see as a set of CQs.
Given sets of atoms A and B, a homomorphism h from A to B is a substitution of by such that . We say that A is mapped to B by h. If there is a homomorphism from A to B, we say that A is more general than B, which is denoted .
Given a fact F and a BCQ Q, the answer to Q in F is positive if . It is well-known that if and only if there is a homomorphism from Q to F. If Q is a non-Boolean CQ, let be the free variables in Q. Then, a tuple of constants is an answer to Q in F if there is a homomorphism from Q to F that maps to for each i. In other words, is an answer to Q in F if and only if the answer to the BCQ obtained from Q by substituting each with is positive.
In this paper, we consider only Boolean queries for simplicity reasons. This is not a restriction, since our mechanisms can actually process a CQ with free variables by translating it into a BCQ with an added atom , where is a special predicate not occurring in the knowledge base. Since can never be erased by a rewriting step, the can only be substituted and will not “disappear”. We can thus compute the rewriting set of a CQ as a Boolean CQ with a special atom, then transform the rewritings into non-Boolean CQs by removing the atom and consider its arguments as free variables. Note that our the generic algorithm can accept as input a union of conjunctive queries as well, since it works exactly in the same way if it takes as input a set of CQs instead of a single CQ.
(Existential rule).
An existential rule (or simply a rule) is a formula , where and are tuple of variables, and are conjunctions of atoms, resp. called the body and the head of R. The frontier of R, denoted by , is the set . The set of existential variables in R is the set .
In the following, we omit quantifiers in rules and queries, as there is no ambiguity. For instance, the rule from Example 2 will be written .
A knowledge base (KB) is composed of a fact F and a finite set of existential rules . The BCQ entailment problem takes as input a KB and a BCQ Q, and asks if holds.
Desirable properties of rewriting sets
Given a query Q and a set of existential rules , rewriting techniques compute a set of queries , which we call a rewriting set hereafter. It is generally desired that such a set satisfies at least three properties: soundness, completeness and minimality.
(Sound and complete set).
Let be a set of existential rules and Q be a BCQ. Let be a set of BCQs. is said to be sound w.r.t. Q and if for all facts F, for all , if can be mapped to F then . Reciprocally, is said to be complete w.r.t. Q and if for all facts F, if then there is such that can be mapped to F.
We mentioned in the introduction that only the most general elements of a rewriting set need to be considered. Indeed, let and be two elements of a rewriting set such that . Then, for any fact F, the set of answers to in F is included in the set of answers to in F. Hence, removing will not undermine completeness (and it will not undermine soundness either). The output of a rewriting algorithm should thus be a minimal set of incomparable queries that “covers” the set of all the sound rewritings of the initial query.
(Covering relation).
Let and be two sets of BCQs. covers, which is denoted , if for all there is with .
Note that covering can also be defined in terms of classical database query containment, i.e., covers if and only if the UCQ is included in the UCQ .
(Minimal set of BCQs, cover).
Let be a set of BCQs. is said to be minimal if there is no such that . A cover of is a minimal set such that .
Since a cover is a minimal set, its elements are pairwise incomparable.
Let and consider the following preorder over : ; ; ; (note that and are equivalent). There are two covers of , namely and . See Fig. 2.
A set of (sound) rewritings may have a finite cover even when it is infinite, as illustrated by Example 4.
Let , , . and have a head restricted to a single atom and no existential variable, hence the classical most general unifier can be used, which unifies the first atom in the query with the atom of a rule head. The rewriting set of Q with is infinite. The first generated queries are the following (note that rule variables are renamed when needed):
However, the set of the most general rewritings is since any other query that can be obtained is more specific than or .
It can be easily checked that all covers of a given set have the same cardinality. We now prove that this property can be extended to the covers of all sound and complete finite rewriting sets of Q, irrespective of the rewriting technique used to compute these sets.
Letbe a set of rules and Q be a BCQ. Any finite cover of a sound and complete rewriting set of Q withis of minimal cardinality (among all sound and complete rewriting sets of Q).
Let and be two arbitrary sound and complete rewriting sets of Q with . Let (resp. ) be one of the finite covers of (resp. ). (resp. ) is also sound and complete, as well as smaller than or equal to (resp. ). We show that they have the same cardinality. Let . There exists such that . If not, Q would be entailed by and since is a sound rewriting set of Q (and maps to itself), but no elements of would map to F: thus, would not be complete. Similarly, there exists such that . Then , which implies that by assumption on . For all , there exists such that and . Such a is unique: indeed, two such elements would be comparable for ≥, which is not possible by construction of . The function associating with is thus a bijection from to , which shows that these two sets have the same cardinality. □
Furthermore, the proof of the preceding theorem shows that, given any two sound and complete rewriting sets of Q, there is a bijection from any cover of the first set to any cover of the second set such that two elements in relation by the bijection are equivalent. However, these elements are not necessarily isomorphic (i.e., equal up to a variable renaming) because they may contain redundancies. Consider the preorder induced by homomorphism on the set of all BCQs definable on some vocabulary. It is well-known that this preorder is such that any of its equivalence classes possesses a unique element of minimal size (up to isomorphism), called its core (notion introduced for graphs,4
See for instance [17], where the notion of a core is traced back to the late sixties.
but easily transferable to queries).
Every query can be transformed into its equivalent core by removing redundant atoms. We recall that a set of existential rules ensuring that a finite sound and complete set of most general rewritings exists for any query is called a finite unification set (fus).5
The notion of a finite unification set was first introduced in [1] and defined with respect to piece-unifiers. However, since piece-unifiers provide a sound and complete rewriting operator (see Section 5) and all the covers of a given set have the same cardinality, the two definitions are equivalent.
Letbe a fus and Q be a BCQ. There is a unique finite sound and complete rewriting set of Q withthat has both minimal cardinality and elements of minimal size.
A generic rewriting algorithm
We will now present a generic rewriting algorithm that takes as input a set of existential rules and a query, and as parameter a rewriting operator. The studied question is the following: which properties should this operator satisfy in order that the algorithm outputs a sound, complete, finite and minimal set?
Rewriting algorithm
(Rewriting operator).
A rewriting operatorrew is a function which takes as input a BCQ Q and a set of rules and outputs a set of BCQs denoted by .
Since the elements of are BCQs, it is possible to apply further steps of rewriting to them. This naturally leads to the notions of k-rewriting and k-saturation.
(k-rewriting).
Let Q be a conjunctive query, be a set of rules and be a rewriting operator. A 1-rewriting of Q (w.r.t. and ) is an element of . A k-rewriting of Q, for , (w.r.t. and ) is a 1-rewriting of a -rewriting of Q.
The term k-saturation is convenient to name the set of queries that can be obtained in at most k rewriting steps.
(k-saturation).
Let Q be a BCQ, be a set of rules and be a rewriting operator. We denote the set of k-rewritings of Q by . We call k-saturation, and denote by , the set of i-rewritings of Q for all . We denote .
In the following, we extend the notations and to a set of BCQs instead of a single BCQ Q: , and .
Algorithm 1 performs a breadth-first exploration of the rewriting space of a given query.6
Note that a depth-first exploration would not ensure termination for fus rules.
At each step, only the most general elements are kept thanks to a covering function, denoted by cover , that computes a cover of a given set.
A generic rewriting algorithm
For termination reasons (see the proof of Property 6), already explored queries are preferred to non-explored queries in the computation of the cover. More precisely, if both and are covers of , with q and homomorphically equivalent and belongs to , then cover does not output . If rew fulfills some good properties (specified below), then, after the ith iteration of the while loop, the i-saturation of Q (with respect to and ) is covered by , while contains the queries that remain to be explored.
In the remainder of this section, we study the conditions that a rewriting operator must meet in order that: (i) the algorithm halts and outputs a cover of all the rewritings that can be obtained with this rewriting operator, provided that such a finite cover exists; (ii) the output cover is sound and complete.
Correctness and termination of the algorithm
We now define a property on the rewriting operator, called prunability. This property is sufficient to ensure that Algorithm 1 outputs a cover of . Intuitively, if an operator is prunable then, for every more general than , the one-step rewritings of are covered by the one-step rewritings of or by itself. It follows that all the rewritings of are covered by and its rewritings. Hence, can be safely removed from the current rewriting set.
(Prunable).
A rewriting operator is said to be prunable if for any set of rules and for all BCQs such that , and , there is such that .
The following lemma states that this can be generalized to k-rewritings for any k.
Letbe a prunable rewriting operator, and letandbe two sets of BCQs. If, then.
We prove by induction on i that .
For , .
For , for any , there is such that . By induction hypothesis, there is such that . is prunable, thus either or there is such that . Since and are both included in , we can conclude. □
This lemma would not be sufficient to prove the correctness of Algorithm 1, as will be discussed in Section 6.1. We need a stronger version, which checks that a query whose 1-rewritings are covered needs not to be explored.
Letbe a prunable rewriting operator, and letandbe two sets of BCQs. If, then.
We prove by induction on i that .
For , .
For , for any , there is such that . By induction hypothesis, there is such that . Since is prunable, either or there is such that . Then, there are two possibilities:
either : since , we have and so ,
or : then . □
Finally, the correctness of Algorithm 1 is based on the following loop invariants.
Letbe a rewriting operator. After each iteration of the while loop of Algorithm1, the following properties hold:
;
;
if rew is prunable then;
for all distinct,and.
Invariants are proved by induction on the number of iterations of the while loop. Below and denote the value of and after i iterations. Invariant 1:
. basis:
.
induction step:
by construction, and . For any we have: either and then by induction hypothesis ; or and then by induction hypothesis we have , which implies .
Invariant 2:
. basis:
and any set covers it.
induction step:
by construction, ; since by induction hypothesis , we have . Furthermore, by construction, ; thus and so . Thus .
Invariant 3:
if rew is prunable then . basis:
.
induction step:
we first show that (i):, then we prove by induction that (ii)::
by construction , thus , and by Invariant 2, we have . Lemma 4 then entails that and we can conclude since ,
by construction, we have ; so, by Lemma 3, we have . Moreover, , thus . Using (i), we have and conclude by induction hypothesis.
Invariant 4:
for all distinct , and . Trivially satisfied thanks to the properties of cover . □
The next property states that if rew is prunable then Algorithm 1 halts for each case where has a finite cover.
Letrewbe a rewriting operator,be a set of rules and Q be a BCQ. Ifhas a finite cover andrewis prunable then Algorithm1halts.
Let be a finite cover of and let m be the largest k for a k-rewriting in .
We thus have . Since the operator is prunable, we have for all (proved with a straightforward induction on i). Thus . Thus, is covered by , and since already explored queries are taken first for the computation of a cover, we have that . Hence Algorithm 1 halts. □
Letrewbe a rewriting operator,be a set of rules and Q be a BCQ. Ifhas a finite cover andrewis prunable then Algorithm1outputs this cover (up to query equivalence).
By Property 6, Algorithm 1 halts. By Invariant 3 from Property 5, where and denote the final values of and in Algorithm 1. Since when Algorithm 1 halts, we have . Thanks to Invariants 1 and 4 from Property 5 we conclude that is a cover of . □
Preserving soundness and completeness
We consider two further properties of a rewriting operator, namely soundness and completeness, with the aim of ensuring the soundness and completeness of the obtained rewriting set within the meaning of Definition 2.
(Soundness/completeness).
Let be a rewriting operator. is sound if for any set of rules , for any BCQ Q, for any , for any fact F, implies that . is complete if for any set of rules , for any BCQ Q, for any fact F such that , there exists such that .
Ifis sound, then the output of Algorithm1is a sound rewriting set of Q and.
Direct consequence of Invariant 1 from Property 5. □
Perhaps surprisingly, the completeness of the rewriting operator is not sufficient to ensure the completeness of the output rewriting set. Examples are provided in Section 6.1. This is due to the dynamic pruning performed at each step of Algorithm 1. Therefore the prunability of the operator is also required.
Ifis prunable and complete, then the output of Algorithm1is a complete rewriting set of Q and.
Algorithm 1 returns when is empty. By Invariant 3 of Property 5, we know that . Since , we obtain that . □
Finally, as stated by the next theorem, when the rewriting operator is sound, complete and prunable, Algorithm 1 is correct and terminates for any finite unification set of rules. We remind that expressive classes of fus rules are known (see the introduction). In particular, the main members of DL-Lite family are generalized by the simple class of linear existential rules. See also Section 7 for examples of such ontologies.
Ifrewis a sound, complete and prunable operator, andis a finite unification set of rules, then for any BCQ Q, Algorithm1outputs a minimal (finite) sound and complete rewriting set of Q with.
If is a fus and rew is a sound and complete operator then has a finite cover. The claim then follows from Properties 8 and 9 and Theorem 7. □
Piece-based rewriting
As mentioned in the introduction (and illustrated in Example 2), existential variables in rule heads induce a structure that has to be taken into account in the rewriting mechanism. Hence the classical notion of a unifier is replaced by that of a piece-unifier [2]. A piece-unifier “unifies” a subset of Q with a subset of , in the sense that the associated substitution u is such that . Given a piece-unifier, Q is partitioned into “pieces”, which are minimal subsets of atoms that must be processed together. More specifically, the cutpoints are the variables from that are not unified with existential variables from (i.e., they are unified with frontier variables or constants); then a piece in Q is a minimal non-empty subset of atoms “glued” by variables other than cutpoints, i.e., connected by a path of variables that are not cutpoints. We recall below the definition of pieces given in [2] (where T corresponds to the set of cutpoints).
Let A be a set of atoms and . A piece of A according to T is a minimal non-empty subset P of A such that, for all a and in A, if and , then .
In this paper, we give a definition of a piece-unifier based on partitions rather than substitutions, which simplifies subsequent proofs. For any substitution u from a set ofvariables to a set of terms associated with a piece-unifier, it holds that . Thus, u can be associated with a partition of such that two terms are in the same class of if and only if they are merged by u; more specifically, we consider the equivalence classes of the symmetric, reflexive and transitive closure of the following relation ∼: if . Conversely, given a partition on a set of terms E, such that no class contains two constants, we can consider a substitution u obtained by selecting an element of each class with priority given to constants: if is a class in the partition and is a selected element, then for all with , we set . If we consider a total order on terms, such that constants are smaller than variables, then a unique substitution is obtained by taking the smallest element in each class. We call admissible partition a partition such that no class contains two constants.
The set of all partitions over a given set is structured in a lattice by the “finer than” relation (given two partitions and , is finer than , denoted by , if every class of is included in a class of ).7
Usually, the notation ≤ is used to denote the relation “finer than”. We adopt the converse convention, which is more in line with substitutions and the ≥ preorder on CQs.
The join of several partitions is the partition obtained by making the union of their non-disjoint classes. The join of two admissible partitions may be a non-admissible partition. We say that several admissible partitions are compatible if their join is an admissible partition. Note that if the concerned partitions are relative to the same set E, then their join is their greatest lower bound in the partition lattice of E.
The following property makes a link between comparable partitions and comparable substitutions.
Letandbe two admissible partitions over the same set such that, with associated substitutionsandrespectively. Then there is a substitution s such that(i.e.,is “more general” than).
The substitution s is built as follows: for any class , let be the class such that . Let (resp. ) be the selected element in (resp. ); if (in this case, is necessarily a variable), then . It can be immediately checked that . □
In the following definition of a piece-unifier, we assume that Q and R have disjoint sets of variables. Given , we call separating variables from , and denote by , the variables occurring in both and : .
(Piece-unifier, cutpoint).
A piece-unifier of Q with R is a triple , where , , and is a partition on satisfying the following three conditions:
is admissible;
if a class in contains an existential variable (from ) then the other terms in the class are non-separating variables from ;
, where u is a substitution associated with .
The cutpoints of μ, denoted by , are the variables from that are not unified with existential variables from (i.e., they are unified with frontier variables or constants): .
Condition 2 in the piece-unifier definition ensures that a separating variable in is necessarily a cutpoint. It follows that is composed of pieces: indeed, an existential variable from is necessarily unified with a non-separating variable from , say x, which ensures that all atoms from in which x occurs are also part of . Figure 3 illustrates these notions.
Piece-unifier.
We provide below some examples of piece-unifiers.
Let and . Let . They are three piece-unifiers of Q with R:
with and
with and
with and
Note that and are each composed of a single piece; and is the join of and .
In the previous example, R has an atomic head, thus a piece-unifier of with R actually unifiesthe atoms from and the head of R into a single atom. In the general case, a piece-unifier unifies and a subset of into a set of atoms, as illustrated by the next example.
Let and . A piece-unifier of Q with R is with , and . Another piece-unifier is with , and .
Note that with , and is not a piece-unifier because the second condition in the definition of piece-unifier is not fulfilled: v is a separating variable and is matched with the existential variable y.
Then, the notions of a one-step rewriting according to a piece-unifier, and of a rewriting obtained by a sequence of one-step rewritings, are defined in the natural way.
(One-step piece-rewriting).
Given a piece-unifier of Q with R, the one-step piece-rewriting of Q according to μ, denoted by , is the BCQ , where u is a substitution associated with .
We thus define inductively a k-step piece-rewriting as a -step piece rewriting of a one-step piece-rewriting. For any k, a k-step piece-rewriting of Q is a piece-rewriting of Q.
The next theorem states that piece-based rewriting is logically sound and complete.
Letbe a KB and Q be a BCQ. Theniff there is a piece-rewritingof Q such that.
It follows from Theorem 12 that a sound and complete rewriting operator can be based on piece-unifiers: we call piece-based rewriting operator, the rewriting operator that, given Q and , outputs all the one-step piece-rewritings of Q according to a piece-unifier of Q with . We denote it by .
Actually, as detailed hereafter, only most general piece-unifiers are to be considered, since the other piece-unifiers produce more specific queries.
(Most general piece-unifier).
Given two piece-unifiers defined on the same subsets of a query and a rule head, and , we say that is more general than (notation ) if is finer than (i.e., ). A piece-unifier is called a most general piece-unifier if it is more general than all the piece-unifiers on and .
Letandbe two piece-unifiers with. Thenandhave the same pieces.
and have the same pieces iff they have the same cutpoints. It holds that since every class from is included in a class from : hence a variable from that is in the same class as a frontier variable or a constant in also is in . It remains to prove that . Let x be a cutpoint of and be the class of x in . Since x is a cutpoint of , there is a term t in that is a constant or a frontier variable. Since , we know that . Let be a term of from (there is at least one term of and one term of in each class since the partition is part of a unifier of and ). We are sure that is not an existential variable because and an existential variable cannot be in the same class as t (Condition 2 in the definition of a piece-unifier), so is a frontier variable or a constant, hence x is a cutpoint of . □
Letandbe two piece-unifiers such that. Then.
Let (resp. ) be a substitution associated with (resp. ). Since , there is a substitution s such that . Then . s is thus a homomorphism from to , hence . □
The following lemma expresses that piece-based rewriting operator is prunable.
Ifthen for any piece-unifierofwith R: either (i)or (ii) there is a piece-unifierofwith R such that.
Let h be a homomorphism from to . Let be a piece-unifier of with R, and let be a substitution associated with . We consider two cases:
If , then is a homomorphism from to . Thus .
Otherwise, let be the non-empty subset of mapped by h to , i.e., , and be the subset of matched by with , i.e., . Let be the partition on such that two terms are in the same class of if these terms or their images by h are in the same class of (i.e., for a term t, we consider t if t is in , and otherwise). By construction, is a piece-unifier of with R. Indeed, fulfills all the conditions of the piece-unifier definition since fulfills these conditions.
Let be a substitution associated with . For each class P of (resp. ), we call selected element the unique element t of P such that (resp. ). We build a substitution s, from the selected elements of the classes in which are variables, to the selected elements of the classes in , as follows: for any class P of , let t be the selected element of P: if t is a variable of then , otherwise (t occurs in ). Note that, for any term t in , we have .
We build now a substitution from to , by considering three cases according to the part of in which the variable occurs, i.e., in but not in , in but not in , or in the remaining part corresponding to the images of by :
if , ;
if , ;
if (or alternatively ), .
We conclude by showing that is a homomorphism from to with two points:
. Indeed, for any variable x of :
either , hence ( is a substitution from variables of ),
or , hence (h is a substitution from variables of ).
. We show that , and since , we have . To show that , we point out that, for any variable x from :
either , then
or , then ( is a substitution from variables of and is a substitution from variables of and ). □
Given a query Q and a set of rules , the piece-based rewriting operator computes the set of one-step piece-rewritings of Q according to all piece-unifiers of Q with a rule . We are now able to show that this operator fulfills the desired properties introduced in Section 4.
Piece-based rewriting operator is sound, complete and prunable; this property is still true if only most general piece-unifiers are considered.
Soundness and completeness follow from Theorem 12. Prunability follows from Lemma 15. Thanks to Property 14, the proof remains true if most general piece-unifiers are considered. □
Exploiting single-piece unifiers
We are now interested in the efficient computation of piece-based rewritings. We identify several sources of combinatorial explosion in the computation of the piece-unifiers between a query and a rule:
The problem of deciding whether there is a piece-unifier of a given query Q with a given rule R is NP-complete in the general case. NP-hardness is easily obtained by considering the case of a rule with an empty frontier: then, there is a piece-unifier between Q and R if and only if there is a homomorphism from Q to , which is an NP-complete problem, Q and H being any sets of atoms.
The number of most general piece-unifiers can be exponential in , even if the rule head H is restricted to a single atom. For instance, assume that each atom of Q unifies with H and forms its own piece; then there may be piece-unifiers obtained by considering all subsets of Q.
The same atom in Q may belong to distinct pieces according to distinct unifiers, as illustrated by the next example.
Let and . Atom belongs to two single-piece unifiers: and . For an additional example, see Example 6, where and both belong to and .
To cope with this complexity, an idea is to rely on single-piece unifiers, i.e., piece-unifiers of the form where is a single piece of Q. This section is devoted to the properties of rewriting operators exploiting this notion. We show that the rewriting operator based on single-piece (most general) unifiers is sound and complete. However, perhaps surprisingly, it is not prunable, which prevents to use it in the generic algorithm. To recover prunability, we will define the aggregation of single-piece unifiers, which provides us with a new rewriting operator, which has all the desired properties and generates rewriting sets with fewer components than the standard piece-unifier. Note, however, that this will not completely remove the second complexity source (i.e., the exponential number of unifiers to consider) since the number of aggregations of single-piece unifiers can still be exponential in the size of Q, even with atomic-head rules.
Single-piece based operator
As expressed by the following theorem, (most general) single-piece unifiers provide a sound and complete operator.
Given a BCQ Q and a set of rules, the set of rewritings of Q obtained by considering exclusively most general single-piece unifiers is sound and complete.
See Appendix. □
The proof of this theorem is given in Appendix since it is not reused hereafter. Indeed, the restriction to single-piece unifiers is not compatible with selecting most general rewritings at each step, as performed in Algorithm 1. We present below some examples that illustrate this incompatibility.
(Basic example).
Let and . There are two single-piece unifiers of Q with R, and , which yield the same rewriting, e.g. . There is also a two-piece unifier , which yields e.g. . A query equivalent to can be obtained from by a further single-piece unification. Now, assume that we restrict unifiers to single-piece unifiers and keep most general rewritings at each step. Since , is not kept, hence will never be generated, whereas it is incomparable with Q.
Concerning the preceding example, given and the substitutions respectively associated with and , one may argue that is redundant and the same holds for ; hence, the problem would be solved by computing instead of and making non-redundant (i.e., equal to ) before computing , which would then be empty. However, the problem goes deeper, as illustrated by the next two examples.
Let and . Again, there are two single-piece unifiers of Q with R: and . One obtains two rewritings more specific than Q, e.g., , and , which are isomorphic. There is also a two-piece unifier , which yields . If we remove and , no query equivalent to can be generated.
(Very simple rule).
This example has two interesting characteristics: (1) it uses unary/binary predicates only (2) it uses a very simple rule expressible with any lightweight description logic, i.e., a linear existential rule where no variable appears twice in the head or the body. Let (see Fig. 4) and . Note that Q is not redundant. There are two single-piece unifiers of Q with R, say and , with pieces and respectively. The obtained queries are pictured in Fig. 4. These queries are both more specific than Q. The removal would prevent the generation of a query equivalent to , which could be generated from Q with a two-piece unifier.
The single-piece-based operator is not prunable.
Follows from the above examples. □
By Theorem 5 and Property 24, one can show that the conclusion of Lemma 3 (Section 4.2) is valid for single-piece unifiers, even though they are not prunable. This justifies that Lemma 3 is not enough to prove the correctness of Algorithm 1.
Nevertheless, single-piece unifiers can still be used as an algorithmic brick to compute more complex piece-unifiers, as shown in the next subsection.
Aggregated-piece based operator
We first explain the ideas that underline aggregated single-piece unifiers. Let us consider the set of single-piece unifiers naturally associated with a piece-unifier μ. If we successively apply each of these underlying single-piece unifiers, we may obtain a CQ strictly more general than , as illustrated by the next example.
Let and . Let be a piece-unifier of Q with R with , and . . has two pieces w.r.t. μ: and . If we successively compute the rewritings with the underlying single-piece unifiers and , we obtain , which is strictly more general than .
Given a set of “compatible” single-piece unifiers of a query Q with a rule (the notion of “compatible” will be formally defined below), we can thus distinguish between the usual piece-unifier performed on the union of the pieces from the unifiers in and an “aggregated unifier” that would correspond to a sequence of applications of the unifiers in . This latter unifier is more interesting than the piece-unifier because, as illustrated by Example 11, it avoids generating some rewritings which are too specific. We will thus rely on the aggregation of single-piece unifiers to recover prunability.
Note that, in this paper, we combine single-piece unifiers of the same rule whereas in [20] we consider the possibility of combining unifiers of distinct rules (and thus compute rewritings from distinct rules in a single step). We keep below the definitions introduced in [20], while pointing out that, in the context of this paper, the rules in the definitions are necessarily copies of the same rule R. Intuitively, an aggregated unifier of R is a piece-unifier of a new rule built by aggregating copies of R (as formally expressed by next Property 19).
(Aggregation of a set of rules).
Let be a set of rules, with pairwise disjoint sets of variables. The aggregation of , denoted by , is the rule .
(Compatible set of piece-unifiers).
Let be a set of piece-unifiers of Q with rules respectively, where the rules have pairwise disjoint sets of variables (in particular, for all , it holds that ). Set is said to be compatible if (1) all and are pairwise disjoint; (2) the join of is admissible.
(Aggregated unifier).
Let be a compatible set of piece-unifiers of Q with rules . An aggregated unifier of Q with w.r.t. is where: (1) ; (2) ; (3) P is the join of . It is said to be single-piece if all the piece-unifiers in are single-piece. It is said to be most general if all the piece-unifiers in are most general.
Let Q be a BCQ andbe a compatible set of piece-unifiers of Q with. Then, the aggregated unifier ofis a piece-unifier of Q with the aggregation of.
We show that the aggregated unifier of satisfies the conditions of the definition of a piece-unifier (Definition 11). Condition 1 is fulfilled since, by definition of compatibility, the join of is admissible. Condition 2 is satisfied as well, because, since satisfy it, so does their join. Indeed, if a class contains an existential variable, it cannot be merged with another by aggregation because its other terms are non-separating variables, hence do not appear in other classes. Concerning the last condition, for all , we have , where is a substitutionassociated with . Since and we know that, for any substitution u associated with , we have . □
According to this property, the rewriting associated with an aggregated unifier μ can be defined as . It corresponds to the rewriting obtained by applying the piece-unifiers associated with the one after the other, as illustrated by the next example.
Consider again Example 11. Let be a copy of R. The aggregation is the rule . Let where and . The aggregated unifier of Q with w.r.t. is . The associated rewriting of Q is , which is equal to the rewriting in Example 11.
The difference between a piece-unifier and an aggregated unifier of Q with R can also be explained as follows: to build a piece-unifier of Q with R, we consider partitions of , while in the aggregation operation we consider partitions of , where k is the number of considered single-piece unifiers, and each is safely renamed from R. In other words, if, in the definition of an aggregated unifier, we assumed that the had been exactly R, instead of safely renamed copies of R, then the aggregation of would have been exactly R after removal of duplicate atoms, and the aggregated unifier would have been the usual piece-unifier.
The next property shows that, from any piece-unifier μ, one can build a most general single-piece aggregated unifier, which produces a rewriting more general than the one produced by μ.
For any piece-unifier μ of Q with R, there is a most general single-piece aggregated unifierof Q withcopies of R such that.
Let be the pieces of according to and let u be a substitution associated with . Let be safely renamed copies of R. Let denote the variable renaming used to produce from R. Let be a set of piece-unifiers of Q with built as follows for all i:
is the image by of the subset of unified by u with
let be the partition built from by replacing each by ; then, is obtained from by (1) restricting it to the terms of and , and (2) refining it as much as possible while keeping the property that , where is a substitution associated with the partition.
For any we immediately check that:
is a most general piece-unifier,
is a single-piece unifier,
for all , with , and are compatible.
Let be the aggregated unifier of Q with w.r.t. . Note that . The above properties fulfilled by any from ensure that is a most general single-piece aggregated unifier.
We note . It remains to prove that . Let be a substitution associated with . For each class P of (resp. ), we call selected element the unique element t of P such that (resp. ).
We build a substitution s, from the selected elements in which are variables, to the selected elements in , as follows: for any class P of , let t be the selected element of P: if t is a variable of , then ; else t is a variable of a : then . Note that for any term t in , there is a variable renaming such that (if t is a constant or a variable from then any can be chosen).
We build now a substitution h from to , by considering three cases according to the part of in which the variable occurs, i.e., in Q but not in , in but not in , or in the remaining part corresponding to the images of by :
if , ;
if , ;
if (or alternatively ), ;
We conclude by showing that h is a homomorphism from to , with two points:
for all i, . Indeed, for any variable :
either , hence (u does not substitute the variables from ),
or , hence ;
. Indeed, for any variable :
either , then ( does not substitute the variables from Q),
or , then ( and u do not substitute the variables from ). □
We call single-piece aggregator the rewriting operator that computes the set of one-step rewritings of a query Q by considering all the most general single-piece aggregated unifiers of Q.
The single-piece aggregator is sound, complete and prunable.
Soundness comes from Property 19 and from the fact that, for any set of rules , let R be the aggregation of , one has . Completeness and prunability rely on the fact that the piece-based rewriting operator fulfills these properties, and the fact that for any queries Q and and any rule R, if , where μ is a piece-unifier, then the query obtained with the single-piece aggregator corresponding to μ is more general than , as expressed by Property 20. □
Detailed algorithms and experiments
In this section, we first detail on the computation of all the most general single-piece unifiers of a query Q with a rule R, and explain how we use them to compute all the single-piece aggregators. Then, we focus on the specific case of unification with atomic-head rules, for which the computation is simpler. Last, we report first experiments.
Computing single-piece unifiers and their aggregation
We first introduce the notion of a pre-unifier, which is weaker than a piece-unifier. To become a piece-unifier, a pre-unifier has to satisfy an additional constraint on the separating variables of the unified subset of Q.
(Valid partition).
Let Q be a BCQ, R be a rule, , , and be a partition on . is valid if no class of contains two constants, or two existential variables of R, or a constant and an existential variable of R, or an existential variable of R and a frontier variable of R.
(Pre-unifier).
Let Q be a BCQ, R be a rule, , , and be a partition on . Then is a pre-unifier of Q with R if (1) is valid, and (2) given a substitution u associated with , it holds that .
The next definition introduces the notion of sticky variables, which are the variables of that prevent to be a piece.
(Sticky variables).
Let Q be a BCQ, R be a rule, , and be a partition on . The sticky variables of in w.r.t. Q and R, denoted by , are the separating variables of that occur in a class of containing an existential variable of R.
The next property ensures that a pre-unifier without sticky variables is a piece-unifier, and reciprocally. Its proof follows from the definitions.
Let Q be a BCQ, R be a rule,,, andbe a partition on. Thenis a piece-unifier of Q with R iff μ is a pre-unifier and.
The fact that we can first build pre-unifiers, then check the absence of sticky variables, suggests an incremental method to compute all the most general single piece-unifiers of Q with R.
The first step consists in computing all the most general pre-unifiers of an atom with an atom with the same predicate. The partition on the terms of these atoms associated with their unification has to be valid. The next definition defines formally this notion of partition.
(Partition by position).
Let A be a set of atoms with the same predicate p. The partition by position associated with A, denoted by , is the partition on such that two terms of A occurring in the same position i () are in the same class of .
Hence, the partition by position associated with has to be valid. We denote by the set of all the most general atomic pre-unifiers, i.e., . Algorithm 2 details the computation of .
Computation ofAPU,the set of most general atomic pre-unifiers
Computation ofSPU,the set of most general single-piece unifiers
Computation of the most general single-piece unifiers extending a given pre-unifier
We then use to compute the set of all the most general single-piece unifiers of Q with R, denoted by . Each atomic pre-unifier of is incrementally extended in all possible ways with other atomic pre-unifiers of , which contain “missing” atoms of Q with respect to sticky variables. Extending pre-unifier with pre-unifier consists in merging both pre-unifiers to obtain a new pre-unifier ; this extension can be performed if and only if the join of and is a valid partition; if the obtained pre-unifier has no sticky variable, it is a single piece-unifier.
Next Algorithms 3, 4 and 5 detail the computation of . Algorithm 3 is the main algorithm. It first uses Algorithm 2 to compute , then, for each atomic pre-unifier , it calls Algorithm 4, which computes the single-piece unifiers extending μ. Algorithm 4 first checks if μ contains sticky variables: if it it is the case, this single-piece unifier is returned, otherwise the algorithm is recursively called, after a call to Algorithm 5 to obtain a set of candidate extensions of μ.
Computation of the pre-unifiers extending a given pre-unifier w.r.t. to a given set of atoms
Finally, the set of all the single-piece aggregators of Q with R is obtained by aggregating the unifiers from all non-empty compatible subsets of . For optimisation reasons, this set is incrementally computed as follows:
Let ; the are called 1-unifiers.
For to the greatest possible rank (i.e., as long as is not empty): let be the set of all i-unifiers obtained by aggregating an -unifier from and a single-piece unifier from .
Return the union of all the obtained.
The specific case of atomic-head rules
Rules with an atomic head are often considered in the literature, specifically in logic programming or in deductive databases. One may ask if piece-unification become simpler in this specific case. In fact, considering atomic-head rules does not simplify the definition of a piece-unifier in itself, but its computation. Indeed, there is now a unique way of associating any atom from Q with the head of a rule. It follows that deciding whether there is a piece-unifier of Q with a rule can be done in linear time with respect to the size of Q (whereas it is NP-complete in the general case) and each atom belongs to a single piece, thus the set of all single-piece unifiers of Q with a rule can be computed in polynomial time.
More precisely, if a rule R has an atomic head, then every atom in Q participates in at most one most general single-piece unifier of Q with R (up to bijective variable renaming). This is is a corollary of the next property.
Let R be an atomic-head rule and Q be a BCQ. For any atom, there is at most onesuch thatandis a piece for a piece-unifier of Q with R.
We prove by contradiction that two single-piece unifiers cannot share an atom of Q. Assume there are and such that and , and and two single-piece-unifiers of Q with R, with . Since , one has or . Assume . Let and . There is at least one variable such that there is an existential variable e of in the class of containing x (otherwise has more than one piece). Since H is atomic, there is a unique way of associating any atom with H, thus the class of containing x contains e as well. It follows that is not a piece since an atom of A and an atom of B share the variable x unified with an existential variable in , while A is included in and B is not. □
The fact that an atom from Q participates in at most one most general single-piece unifier allows some algorithmic improvements. Indeed, when a piece-unifier of with is successfully built, all the atoms of can be removed from the set of atoms to be considered in the computation of the next piece-unifiers. Furthermore, there is a unique way of associating any atom from Q with , hence there is only one pre-unifier of with . Algorithm 6 exploits these specific aspects to compute all the single-piece unifiers of a query with an atomic-head rule.
Let and . Let us start from : this atom is unifiable with and necessarily belongs to the same piece-unifier (if any) because ; indeed, v is in the same class as the existential variable y; however, is not unifiable with because, since v occurs at the first and at the second position of a p atom, x and y should be unified, which is not possible, since y is an existential variable; thus, does not belong to any piece-unifier with R. However, still has to be considered. Let us start from it: is unifiable with and forms its own piece because sticky is empty; indeed, t is in the same class as the existential variable y, but does not occur in any other atom. Hence, there is a single (most general) piece-unifier of Q with R, namely .
Computation of all the most general single-piece unifiers in the case of atomic-head rules
It should be noted that any existential rule can be decomposed into an equivalent set of rules with atomic head by introducing a new predicate, which gathers the variables of the original head (e.g. [1,4]). Hence, the restriction to atomic-head rules can be made without loss of expressivity. Now, the question is whether it is more efficient to directly process rules with complex heads, or to decompose them into atomic-head rules and benefit from a simpler computation of piece-unifiers. The experiments reported below clearly show that the former choice is better.
Experiments and perspectives
The query rewriting algorithm, instantiated with the rewriting operator described in the preceding section, has been implemented in Java. Since benchmarks dedicated to existential rules are not available yet, first experiments were carried out with sets of existential rules obtained by translation from ontologies expressed in the description logic DL-Lite, namely ADOLENA (A), STOCKEXCHANGE (S), UNIVERSITY (U) and VICODI (V). This benchmark was introduced in [27] and then used in several papers, e.g., [12,15,18,19]. Ontologies A and U contain some rules with multiple heads; the ontologies obtained by decomposing rules into atomic-head rules are respectively known as AX and UX. Additionally, we considered the translation of a larger ontology, the DL-Lite version of OpenGalen28
http://www.opengalen.org/
(G), which contains more than 50k rules. Each ontology is provided with five handcrafted queries.
In [19], we compared with other systems concerning the size of the output and pointed out that none of the existing systems output a complete set of rewritings. However, beside the fact that these systems have evolved since then, one can argue that the size of the rewriting set should not be a decisive criterion (indeed, assuming that the systems are sound and complete, a minimal rewriting set can be obtained by selecting most general elements, see Theorem 1). Therefore, other criteria have to be taken into account, such as the runtime or the total number of CQs generated during the rewriting process.
Impact of rule decomposition
Time (ms)
Output (#)
Generated (#)
A
AX
A
AX
A
AX
Q1
170
330
27
41
459
720
Q2
90
4900
50
1431
171
4567
Q3
240
47290
104
4466
316
13838
Q4
440
28620
224
3159
826
14526
Q5
2100
1h36
624
32921
2416
215523
U
UX
U
UX
U
UX
Q1
0
10
2
5
1
4
Q2
0
0
1
1
105
120
Q3
10
20
4
12
42
155
Q4
1370
4190
2
5
2142
4720
Q5
20
20
10
25
153
351
All tests reported here were performed on a DELL machine with a processor at 3.60 GHz and 16 GB of RAM, with 4 GB allocated to the Java Virtual Machine.
Table 1 reports the behavior of the rewriting algorithm on A vs AX and U vs UX with respect to three parameters: the runtime, the size of the output (number of CQs) and the number of generated CQs. The size of the output for AX and UX is before elimination of queries containing auxiliary predicates. The generated CQs are all the rewritings built during the rewriting process (excluding the initial query and possibly including some multi-occurrences of the same rewritings). We can see that avoiding rule decomposition makes a substantial difference. The gain is particularly striking with on A/AX with respect to all three parameters (the runtime is 21 seconds for A and 1 hour and 36 minutes for AX, the size of the output is more than 52 times larger for AX before elimination of useless queries, and the number of generated queries is 89 times larger for AX). Moreover, we point out that only 29/102 rules in A and 5/77 rules in U have multiple heads, with only 2 atoms; we can reasonably expect that the gain increases with the proportion of multiple-head rules and the size of rule heads.
Generated queries with the single-piece aggregator
Rules
Query
Output (#)
Generated (#)
Explored (#)
Time (ms)
A
Q1
27
459
74
170
Q2
50
171
70
90
Q3
104
316
104
240
Q4
224
826
256
440
Q5
624
2416
624
2100
S
Q1
6
9
6
0
Q2
2
137
23
10
Q3
4
275
20
40
Q4
4
450
58
90
Q5
8
688
44
110
U
Q1
2
1
2
0
Q2
1
105
32
0
Q3
4
42
10
10
Q4
2
2142
556
1370
Q5
10
153
14
20
V
Q1
15
14
15
0
Q2
10
9
10
0
Q3
72
117
72
30
Q4
185
328
185
110
Q5
30
59
30
10
G
Q1
2
2
2
10
Q2
1152
1275
1152
1090
Q3
488
1514
488
1050
Q4
147
154
147
30
Q5
324
908
324
1000
Table 2 presents the size of the output, the number of generated CQs and the number of explored CQs for each ontology (as well as the runtime for information, see also Table 4). Note that, since subsumed rewritings are removed at each step of the breadth-first algorithm, only some of the rewritings generated at a given step are explored at the next step. We can see that the number of generated queries can be large with respect to the cardinality of the output, which is less marked for explored queries.
Types of rules in the ontologies
Ontology
Rules (#)
Hierarchical rules (#)
A
102
72
S
52
16
U
77
36
V
222
202
G
50764
26980
Our query rewriting tool is able to process any kind of existential rules. There is of course a price to pay for this expressivity, in terms of complexity of the involved mechanisms and time efficiency. We consider the algorithms presented in this paper as basic versions, which can be further improved in various ways, for instance by processing some specific kinds of rules in a specific way. Let us illustrate this with the example of rules expressing taxonomies. Indeed, a large part of currently available ontologies is actually composed of concept and role hierarchies. See Table 3: 71%, 31%, 47%, 91% and 53 % of the rules in ontologies A, S, U, V and G, respectively, express atomic concept or role inclusions.
We can compile these sets of rules as preorders on predicates. The detailed presentation of how to compute and process these preorders is out of the scope of this paper. Briefly said, the preorders are integrated into the rewriting process, which allows to generate a smaller rewriting set, this set being unfolded at the end to produce the expected UCQ. Our purpose here is just to illustrate the fact that some improvements of the basic version can dramatically decrease the runtime, while still relying on the same fundamental mechanisms. Table 4 allows to compare these two versions: PURE9
Piece Unification based REwriting.
denotes the basic version of our tool and PURE is the version with compiled hierarchical rules (note that compilation is performed offline, hence the algorithm takes as input the preorder and the non-hierarchical rules).
We also compared to two other query rewriting tools, Nyaya and Rapid. Nyaya is a tool dedicated to UCQ rewriting with linear and sticky existential rules, which implements the techniques presented in [15], in particular an optimization for linear rules (which include DL-Lite ontologies). Table 4 shows that our tool is generally faster on the considered benchmark, even in its basic version, specially on Ontology A. This difference could be due to the fact that Nyaya does not directly process multiple-head rules, hence has to decompose them into atomic-head rules. For the large ontology G, Nyaya seemed to be still in a preprocessing step after several hours. Note that the very latest version of Nyaya includes parallel rewriting, which we did not consider here, since our tool does include this kind of optimization.
As far as we know, Nyaya is the only other tool able to process existential rules beyond lightweight DLs. We think that comparing to DL rewriting tools is not very relevant, since these systems make use of specific features, like predicate arity bounded by two, or the tree-model property. Tools tailored for DL-Lite exploit even further the very specific form of DL-Lite axioms. However, we compared to one of these tools, namely Rapid, to obtain an order of magnitude. Rapid is one of the fastest tools dedicated to DL-Lite ontologies [12]. In Table 4, we can see that Rapid is indeed generally faster than our tool, the difference being less pronounced on the version with rule compilation.
Runtime (ms) with several query rewriting tools
Rules
Query
PURE
PURE
Nyaya
Rapid
A
Q1
170
120
1122
18
Q2
90
40
862
23
Q3
240
30
2363
34
Q4
440
200
5557
48
Q5
2100
440
33206
93
S
Q1
0
0
4
7
Q2
10
10
4
9
Q3
40
40
46
13
Q4
90
20
7
12
Q5
110
80
8
15
U
Q1
0
0
8
6
Q2
0
10
4
9
Q3
10
0
12
7
Q4
1370
120
6
13
Q5
20
10
10
15
V
Q1
0
0
13
9
Q2
0
0
51
5
Q3
30
0
21
25
Q4
110
30
28
32
Q5
10
0
22
26
G
Q1
10
0
5
Q2
1090
620
74
Q3
1050
290
59
Q4
30
10
10
Q5
1000
110
40
Current work includes processing specific kinds of rules in a specific way, while keeping a system able to process any set of existential rules. Other optimizations could be implemented, such as exploiting dependencies between rules to select the rules to be considered at each step. Moreover, the form of the considered output itself, i.e., a union of conjunctive queries, leads to combinatorial explosion. Considering semi-conjunctive queries instead of conjunctive queries as in [31] can save much with respect to both the running time and the size of the output, without compromising the efficiency of query evaluation; in [31] the piece-based rewriting operator is combined with query factorization techniques. We did not consider generating Datalog queries yet. Finally, further experiments should be performed on more complex ontologies. However, even if slightly more complex ontologies could be obtained by translation from description logics, real-world ontologies that would take advantage of the expressiveness of existential rules, as well as associated queries, are currently lacking.
Footnotes
Acknowledgments
We thank Giorgio Orsi for providing us with rule versions of ontologies A, S, U and V, as well as the version of Nyaya used for the experiments (October 2013 version). This work was partially funded by the ANR project PAGODA (ANR-12-JS02-007-01).
Proof of Theorem 17
To prove the completeness of the single-piece based operator, we first prove the following property:
References
1.
J.-F.Baget, M.Leclère, M.-L.Mugnier and E.Salvat, Extending decidable cases for rules with existential variables, in: Proc. of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, Pasadena, California, USA, July 11–17, 2009, C.Boutilier, ed., 2009, pp. 677–682.
2.
J.-F.Baget, M.Leclère, M.-L.Mugnier and E.Salvat,
On rules with existential variables: Walking the decidability line, Artificial Intelligence175(9–10) (2011), 1620–1654.
3.
C.Beeri and M.Y.Vardi, The implication problem for data dependencies, in: Proc. of the 8th Colloquium on Automata, Languages and Programming, Acre (Akko), Israel, July 13–17, 1981, Lecture Notes in Computer Science, Vol. 115, Springer, 1981, pp. 73–85.
4.
A.Calì, G.Gottlob and M.Kifer, Taming the infinite chase: Query answering under expressive relational constraints, in: Proc. of the 21st International Workshop on Description Logics (DL2008), Dresden, Germany, May 13–16, 2008, Vol. 353, CEUR-WS.org, 2008.
5.
A.Calì, G.Gottlob and T.Lukasiewicz, A general datalog-based framework for tractable query answering over ontologies, in: Proc. of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, Providence, Rhode Island, USA, June 19–July 1, 2009, J.Paredaens and J.Su, eds, ACM, 2009, pp. 77–86.
6.
A.Calì, G.Gottlob and T.Lukasiewicz,
A general Datalog-based framework for tractable query answering over ontologies, J. Web Sem.14 (2012), 57–83.
7.
A.Calì, G.Gottlob and A.Pieris, Query answering under non-guarded rules in Datalog+/−, in: Proc. of the Fourth International Conference on Web Reasoning and Rule Systems, RR 2010, Bressanone/Brixen, Italy, September 22–24, 2010, P.Hitzler and T.Lukasiewicz, eds, Lecture Notes in Computer Science, Vol. 6333, Springer, 2010, pp. 1–17.
8.
A.Calì, D.Lembo and R.Rosati, On the decidability and complexity of query answering over inconsistent and incomplete databases, in: Proc. of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, June 9–12, 2003, F.Neven, C.Beeri and T.Milo, eds, ACM, 2003, pp. 260–271.
9.
D.Calvanese, G.De Giacomo, D.Lembo, M.Lenzerini and R.Rosati,
Tractable reasoning and efficient query answering in description logics: The DL-Lite family, J. Autom. Reasoning39(3) (2007), 385–429.
10.
A.K.Chandra, H.R.Lewis and J.A.Makowsky, Embedded implicational dependencies and their inference problem, in: Proc. of the 13th Annual ACM Symposium on Theory of Computing, Milwaukee, Wisconsin, USA, May 11–13, 1981, ACM, 1981, pp. 342–354.
11.
M.Chein and M.-L.Mugnier, Graph-Based Knowledge Representation and Reasoning—Computational Foundations of Conceptual Graphs, Advanced Information and Knowledge Processing, Springer, 2008.
12.
A.Chortaras, D.Trivela and G.B.Stamou, Optimized query rewriting for OWL 2 QL, in: Proc. of the 23rd International Conference on Automated Deduction on Automated Deduction, CADE-23, Wroclaw, Poland, July 31–August 5, 2011, N.Bjørner and V.Sofronie-Stokkermans, eds, Vol. 6803, Springer, 2011, pp. 192–206.
13.
C.Civili and R.Rosati, A broad class of first-order rewritable tuple-generating dependencies, in: Proc. of the Second International Workshop on Datalog in Academia and Industry, Datalog 2.0, Vienna, Austria, September 11–13, 2012, P.Barceló and R.Pichler, eds, Vol. 7494, Springer, 2012, pp. 68–80.
14.
B.Cuenca Grau, I.Horrocks, M.Krötzsch, C.Kupke, D.Magka, B.Motik and Z.Wang,
Acyclicity notions for existential rules and their application to query answering in ontologies, J. Artif. Intell. Res. (JAIR)47 (2013), 741–808.
15.
G.Gottlob, G.Orsi and A.Pieris, Ontological queries: Rewriting and optimization, in: Proc. of the 27th International Conference on Data Engineering, ICDE 2011, Hannover, Germany, April 11–16, 2011, S.Abiteboul, K.Böhm, C.Koch and K.-L.Tan, eds, IEEE Computer Society, 2011, pp. 2–13.
16.
G.Gottlob and T.Schwentick, Rewriting ontological queries into small nonrecursive datalog programs, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012, G.Brewka, T.Eiter and S.A.McIlraith, eds, AAAI Press, 2012.
17.
P.Hell and J.Nesetril,
The core of a graph, Discrete Mathematics109(1–3) (1992), 117–126.
18.
M.Imprialou, G.Stoilos and B.Cuenca Grau, Benchmarking ontology-based query rewriting systems, in: Proc. of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, Ontario, Canada, July 22–26, 2012, J.Hoffmann and B.Selman, eds, AAAI Press, 2012.
19.
M.König, M.Leclère, M.-L.Mugnier and M.Thomazo, A Sound and complete backward chaining algorithm for existential rules, in: Proc. of the 6th International Conference on Web Reasoning and Rule Systems, RR 2012, Vienna, Austria, September 10–12, 2012, M.Krötzsch and U.Straccia, eds, Lecture Notes in Computer Science, Vol. 7497, Springer, 2012, pp. 122–138.
20.
M.König, M.Leclère, M.-L.Mugnier and M.Thomazo, On the exploration of the query rewriting space with existential rules, in: Proc. of the 7th International Conference on Web Reasoning and Rule Systems, RR 2013, Mannheim, Germany, July 27–29, 2013, W.Faber and D.Lembo, eds, Lecture Notes in Computer Science, Vol. 7994, Springer, 2013, pp. 123–137.
21.
R.Kontchakov, C.Lutz, D.Toman, F.Wolter and M.Zakharyaschev, The combined approach to ontology-based data access, in: Proc. of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Catalonia, Spain, July 16–22, 2011, T.Walsh, ed., IJCAI/AAAI, 2011, pp. 2656–2661.
22.
M.Krötzsch and S.Rudolph, Extending decidable existential rules by joining acyclicity and guardedness, in: Proc. of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Catalonia, Spain, July 16–22, 2011, T.Walsh, ed., IJCAI/AAAI, 2011, pp. 963–968.
23.
N.Leone, M.Manna, G.Terracina and P.Veltri, Efficiently computable datalog ∃ programs, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012, G.Brewka, T.Eiter and S.A.McIlraith, eds, AAAI Press, 2012.
24.
C.Lutz, D.Toman and F.Wolter, Conjunctive query answering in the description logic ℰℒ using a relational database system, in: Proc. of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, Pasadena, California, USA, July 11–17, 2009, C.Boutilier, ed., 2009, pp. 2070–2075.
25.
M.-L.Mugnier, Ontological query answering with existential rules, in: Proc. of the 5th International Conference on Web Reasoning and Rule Systems, RR 2011, Galway, Ireland, August 29–30, 2011, S.Rudolph and C.Gutierrez, eds, Vol. 6902, Springer, 2011, pp. 2–23.
26.
G.Orsi and A.Pieris,
Optimizing query answering under ontological constraints, PVLDB4(11) (2011), 1004–1015.
27.
H.Pérez-Urbina, I.Horrocks and B.Motik, Efficient query answering for OWL 2, in: Proc. of the 8th International Semantic Web Conference on The Semantic Web, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009, A.Bernstein, D.R.Karger, T.Heath, L.Feigenbaum, D.Maynard, E.Motta and K.Thirunarayan, eds, Lecture Notes in Computer Science, Vol. 5823, Springer, 2009, pp. 489–504.
28.
M.Rodriguez-Muro and D.Calvanese, High performance query answering over DL-Lite ontologies, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012, G.Brewka, T.Eiter and S.A.McIlraith, eds, AAAI Press, 2012.
29.
R.Rosati and A.Almatelli, Improving query answering over DL-Lite ontologies, in: Proc. of the Twelfth International Conference on Principles of Knowledge Representation and Reasoning, KR 2010, Toronto, Ontario, Canada, May 9–13, 2010, F.Lin, U.Sattler and M.Truszczynski, eds, AAAI Press, 2010.
30.
E.Salvat and M.-L.Mugnier, Sound and complete forward and backward chainingd of graph rules, in: Proc. of the 4th International Conference on Conceptual Structures: Knowledge Representation as Interlingua, ICCS’96, Sydney, Australia, August 19–22, 1996, P.W.Eklund, G.Ellis and G.Mann, eds, Lecture Notes in Computer Science, Vol. 1115, Springer, 1996, pp. 248–262.
31.
M.Thomazo, Compact rewriting for existential rules, in: Proc. of the 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, Beijing, China, August 3–9, 2013, F.Rossi, ed., IJCAI/AAAI, 2013.
32.
M.Thomazo, J.-F.Baget, M.-L.Mugnier and S.Rudolph, A generic querying algorithm for greedy sets of existential rules, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning: KR 2012, Rome, Italy, June 10–14, 2012, G.Brewka, T.Eiter and S.A.McIlraith, eds, AAAI Press, 2012.
33.
T.Venetis, G.Stoilos and G.B.Stamou,
Query extensions and incremental query rewriting for OWL 2 QL ontologies, J. Data Semantics3(1) (2014), 1–23.
34.
W3C OWL Working Group, OWL 2 Web Ontology Language: Document Overview, W3C Recommendation, 2009.