Sage Journals: Discover world-class research

Abstract

We address the issue of Ontology-Based Data Access, with ontologies represented in the framework of existential rules, also known as Datalog±. A well-known approach involves rewriting the query using ontological knowledge. We focus here on the basic rewriting technique which consists of rewriting the initial query into a union of conjunctive queries. First, we study a generic breadth-first rewriting algorithm, which takes any rewriting operator as a parameter, and define properties of rewriting operators that ensure the correctness of the algorithm. Then, we focus on piece-unifiers, which provide a rewriting operator with the desired properties. Finally, we propose an implementation of this framework and report some experiments.

Keywords

Ontology-Based Data Access existential rules conjunctive query answering finite unification sets query rewriting

1. Introduction

We address the issue of Ontology-Based Data Access, which aims at exploiting knowledge expressed in ontologies while querying data. In this paper, ontologies are represented in the framework of existential rules [2,22], also known as Datalog± [4,5]. Existential rules allow one to assert the existence of new unknown individuals, which is a key feature in an open-world perspective, as for instance in incomplete databases [8]. These rules are of the form $body \to head$ , where the body and the head are conjunctions of atoms (without functions) and variables that occur only in the head are existentially quantified. They generalize lightweight description logics (DLs), which form the core of the tractable profiles of OWL2 [34].

Fig. 1.

Forward/backward chaining.

The general query answering problem can be expressed as follows: given a knowledge base (KB) $K$ composed of a set of facts—or data—and an ontology (a set of existential rules here), and a query Q, compute the set of answers to Q in $K$ . In this paper, we consider Boolean conjunctive queries (Boolean CQs or BCQs). Note however that all our results are easily extended to non-Boolean conjunctive queries as well as to unions of conjunctive queries. The fundamental problem, called BCQ entailment hereafter, can be recast as follows: given a KB $K$ , composed of facts and existential rules, and a Boolean conjunctive query Q, is Q entailed by $K$ ?

BCQ entailment is undecidable for general existential rules (e.g., [3,10], on the implication problem for tuple-generating dependencies, which have the same form as existential rules). There has been an intense research effort aimed at finding decidable subsets of rules that provide good tradeoffs between expressivity and complexity of query answering (see e.g., [25] for a survey on decidable classes of rules). These decidable rule fragments overcome some of the limitations of DLs. In particular, they have unrestricted predicate arity, while DLs consider unary and binary predicates only, which allows one for a natural coupling with database schemas, in which relations may have any arity; moreover, adding information, such as data provenance, is made easier by the unrestricted predicate arity, since this information can be added as a new predicate argument.

There are two main approaches to solve BCQ entailment, which are linked to the classical paradigms for processing rules, namely forward and backward chaining, as illustrated by the next example.

Example 1.

Let us consider data on movies, with unary relations $movie$ and $actor$ , and a binary relation $play$ (intuitively, $play (x, y)$ means that “x plays a role in y”). Let Q be a query asking if a given person, whose identifier is B, plays a role somewhere, i.e., $Q = \exists y play (B, y)$ . Let R be an existential rule expressing that “every actor plays a role in some movie”, i.e., $\forall x (actor (x) \to \exists y (play (x, y) \land movie (y)))$ . Assume that the data contain $actor (B)$ . If Q is asked on these data, the answer is no. However, the rule allows to infer that actor B plays in a movie, thus the answer to Q should be yes. Rule R can be used in a forward manner, i.e., it can be applied to the data: then, the knowledge $\exists y_{0} (play (B, y_{0}) \land movie (y_{0}))$ is added, where $y_{0}$ is a new variable. Query Q can be mapped to the enriched data, which allows to answer positively. Now, R can also be used in a backward manner, i.e., to rewrite Q, which yields the new query $Q^{'} = actor (B)$ . This query can be mapped to the (initial) data, which provides the positive answer.

Both approaches can be seen as ways of reducing the problem to a classical database query answering problem by eliminating the rules, see Fig. 1. The first approach consists in applying the rules to the data, thus materializing entailed facts into the data. Then, Q is entailed by $K$ if and only if it can be mapped to this materialized database. This approach is applicable either when the forward chaining procedure stops “naturally” (see [14] for a survey on these cases), or when it stops by taking some parameters into consideration, typically the size of the query [4,23]. The second approach consists in using the rules to rewrite the query into a first-order query (typically a union of conjunctive queries [9,15,27,28,33]) or a non-recursive Datalog program [16,26,29]. Then, Q is entailed by $K$ if and only if the rewritten query is entailed by the initial database.

Materialization has the advantage of enabling efficient query answering but may be not appropriate for data size, data access rights or data maintenance reasons. Query rewriting has the advantage of avoiding changes in the data, however its drawback is that the rewritten query may be large, even exponential in the size of initial query, hence less efficiently processed, at least with current database techniques. Finally, techniques combining both approaches have been developed, in particular so-called combined approach [21,24] for lightweight description logics, as well as a similar algorithm for a large class of existential rules [32].

In this paper, we focus on rewriting techniques, and more specifically on rewriting the initial conjunctive query Q into a union of conjunctive queries, that we will see as a set of conjunctive queries. This set is called a rewriting set of Q and each element of a rewriting set is called a rewriting. While most previously cited work focuses on specific rule sublanguages (mostly DL-Lite, linear and sticky existential rules), we consider general existential rules. This means that our algorithm does not make any syntactic assumption on the input set of rules, but will terminate only in some cases (so-called finite unification sets of rules, see hereafter).

The goal is to compute a rewriting set both sound (if one of its elements maps to the initial database, then $K$ entails Q) and complete (if $K$ entails Q then there is an element that maps to the initial database). Minimality may also be a desirable property. In particular, let us consider the generalization relation (a preorder) induced on Boolean conjunctive queries by homomorphism: we say that $Q_{1}$ is more general than $Q_{2}$ if there is a homomorphism from $Q_{1}$ to $Q_{2}$ ; it is well-known that the existence of such a homomorphism is equivalent to the following property: for any set of facts F, if theanswer to $Q_{2}$ in F is positive, then so is the answer to $Q_{1}$ . We point out that any sound and complete rewriting set of a query Q remains sound and complete when it is restricted to its most general elements. Since BCQ entailment is undecidable, there is no guarantee that such a finite set exists for a given query and general existential rules. A set of existential rules ensuring that a finite sound and complete set of most general rewritings exists for any query is called a finite unification set (fus) [2]. The fus property is not recognizable [2], but several easily recognizable fus classes have been exhibited in the literature: atomic-body rules [1], also known as linear TGDs [5], multi-linear TGDs [6], sticky(-join) rules [7,15], weakly-recursive rules [13] and sets of rules with an acyclic graph of rule dependencies [1]. By definition, the fus property is a specific case of first-order rewritability, which means that the set of rules allows to rewrite any CQ into a (sound and complete) first-order query; it is suspected that both properties are actually equivalent, however, to the best of our knowledge, no proof of this result has been published.

Paper contributions We start from a generic algorithm which, given a BCQ and a set of existential rules, computes a rewriting set. This task can be recast in terms of exploring a potentially infinite space of queries, composed of the initial conjunctive query and its (sound) rewritings, structured by the generalization preorder. The algorithm explores this space in a breadth-first way, with the aim of computing a complete rewriting set. It maintains a rewriting set $Q$ and iteratively performs the following tasks: (1) generate all the one-step rewritings from unexplored queries in $Q$ ; (2) add these rewritings to $Q$ and update $Q$ in order to keep only incomparable most general elements. A rewriting operator is a function that, given a query and a set of rules, returns the one-step rewritings of this query. Note that it may be the case that the set of sound rewritings of the query is infinite while the set of its most general sound rewritings is finite. It follows that a simple breadth-first exploration of the rewriting space is not sufficient to ensure finiteness of the process, even for fus rules; one also has to maintain a set of the most general rewritings. This algorithm is generic in the sense that it is not restricted to a particular kind of existential rules nor to a specific rewriting operator (without guarantee of termination though).

This algorithmic scheme established, we then asked ourselves the following questions:

Assuming that the algorithm outputs a finite sound and complete set rewritings, composed of pairwise incomparable queries, is this set of minimal cardinality, in the sense that no sound and complete rewriting set produced by any other algorithm can be strictly smaller?

At each step of the algorithm, some queries are discarded, because they are more specific than other rewritings, even if they have not been explored yet. The question is whether this dynamic pruning of the search space keeps the completeness of the output. More generally, which properties have to be fulfilled by the operator to ensure the correctness of the algorithm and its termination for fus rules?

Finally, design a rewriting operator that fulfills the desired properties and leads to the effective computation of a sound and complete rewriting set.

With respect to the first question, we show that all sound and complete sets of rewritings, restricted to their most general elements, have the same cardinality, which is minimal with respect to the completeness property. Moreover, if we delete redundant atoms from the obtained CQs (which can be performed by a polynomial number of homomorphism tests for each query),1

See e.g. [11], Section 2.6, on basic conceptual graphs. The algorithm can even by made linear, noticing that an atom needs to be considered only once.

then we obtain a unique minimal sound and complete set of CQs of minimal size; uniqueness is of course up to a bijective variable renaming.

To answer the second question, we define several properties that a rewriting operator has to satisfy and show that these properties actually ensure the correctness of the algorithm and its termination for fus rules. In particular, we point out that the fact that a query may be removed from the rewriting set before being explored may prevent the completeness of the output, even if the rewriting operator is theoretically able to generate a complete output. The prunability of the rewriting operator ensures that this dynamic pruning can be safely performed. Briefly, this property holds if, for all queries $Q_{1}$ and $Q_{2}$ , when $Q_{1}$ is more general than $Q_{2}$ then any one-step rewriting of $Q_{2}$ is less general than $Q_{1}$ itself or one of the one-step rewritings of $Q_{1}$ ; intuitively, this allows to discard the rewriting $Q_{2}$ even when its one-step rewritings have not been generated yet. Note that this kind of properties ties in with an issue raised in [18] about the gap between theoretical completeness of some methods and the effective completeness of their implementation, this gap being mainly due to algorithmic optimizations (here the dynamic pruning).

Concerning the third question, we proceed in several steps. First, we rely on a specific unifier, called a piece-unifier, that was designed for backward chaining with conceptual graph rules (whose logical translation is exactly existential rules [30]). As in classical backward chaining, the rewriting process is based on a unification operation between the current query and a rule head. However, existential variables in rule heads induce a structure that has to be considered to keep soundness. Thus, instead of unifying a single atom of the query at once, our unifier processes a subset of atoms from the query. A piece is a minimal subset of atoms from the query that have to be erased together, hence the name piece-unifier. We present below a very simple example of piece unification (in particular, the head of the existential rule is restricted to a single atom).

Example 2.

Let $R = \forall x (q (x) \to \exists y p (x, y))$ and the BCQ $Q = \exists u \exists v \exists w (p (u, v) \land p (w, v) \land r (u, w))$ . Assume we want to unify the atom $p (u, v)$ from Q with $p (x, y)$ , for instance by a substitution ${(u, x), (v, y)}$ .2

A substitution is given as a set of pairs, where a pair $(x, e)$ means that x is substituted by e.

Since v is unified with the existential variable y, all other atoms from Q containing v must also be considered: indeed, simply rewriting Q into

Q_{1} = \exists w \exists x \exists y (q (x) \land p (w, y) \land r (x, w))

would be unsound: intuitively, the fact that the atoms

p (u, v)

and

p (w, v)

in Q share a variable would be lost in atoms

q (x)

and

p (w, y)

; for instance

F = q (a) \land p (b, c) \land r (a, b)

would answer

Q_{1}

despite Q being not entailed by F and R. Thus,

p (u, v)

and

p (w, v)

have to be both unified with the head of R, for instance by means of the following substitution:

μ = {(u, x), (v, y), (w, x)}

{p (u, v), p (w, v)}

is called a piece. The corresponding rewriting of Q is

\exists x (q (x) \land r (x, x))

Piece-unifiers lead to a logically sound and complete rewriting method. As far as we know, it is the only method accepting any kind of existential rules, while staying in this fragment, i.e., without Skolemization of rule heads to replace existential variables with Skolem functions.

We show that the piece-based rewriting operator fulfills the desired properties ensuring the correctness of the generic algorithm, and its termination in the case of fus rules.

The next question was how to optimize the rewriting step. Indeed, the problem of deciding whether there is a piece-unifier between a query and a rule head is NP-complete and the number of piece-unifiers can be exponential in the size of the query. To cope with these sources of complexity, we consider so-called single-piece unifiers, which unify a single-piece of the query at once (like μ in Example 2). When, additionally, the head of a rule R is restricted to an atom, which is a frequent case, each atom in a query Q belongs to at most one piece with respect to R; then, the number of (most general) single-piece unifiers of Q with the head of R is bounded by the size of Q.

We show that the single-piece based rewriting operator is able to generate a sound and complete rewriting set. However, as pointed out in several examples, it is not prunable. Hence, single-piece unifiers have to be combined to recover prunability. We thus define the aggregation of single-piece unifiers and show that the corresponding rewriting operator fulfills all desired properties and generates fewer queries than the piece-based rewriting operator. Detailed algorithms are given and first experiments are reported.

Paper organization Section 2 recalls some basic notions about the existential rule framework. Section 3 defines sound, complete and minimal sets of rewritings. In Section 4, the generic breadth-first algorithm is introduced and general properties of rewriting operators are studied. Section 5 presents the piece-based rewriting operator. In Section 6, we focus on exploiting single-piece unifiers and introduce the rewriting operator based on their aggregation. Finally, Section 7 is devoted to detailed algorithms and experiments, as well as to further work.

This is an extended version of papers by the same authors published at RR 2012 and RR 2013 (International Conference on Web Reasoning and Rule Systems).

2. Preliminaries

An atom is of the form $p (t_{1}, \dots, t_{k})$ where p is a predicate with arity k, and the $t_{i}$ are terms, i.e., variables or constants. Given an atom or a set of atoms A, $vars (A)$ , $consts (A)$ and $terms (A)$ denote its sets of variables, constants and terms, respectively. In all the examples in this paper, the terms are variables (denoted by x, y, z, etc.). ⊨ denotes the classical logical consequence. Two formulas $f_{1}$ and $f_{2}$ are said to be equivalent if $f_{1} ⊨ f_{2}$ and $f_{2} ⊨ f_{1}$ .

A fact is an existentially closed conjunction of atoms.3

³
We generalize the classical notion of a fact in order to take existential variables into account.

A conjunctive query (CQ) is an existentially quantified conjunction of atoms. When it is a closed formula, it is called a Boolean CQ (BCQ). Hence, facts and BCQs have the same logical form. In the following, we will see them as sets of atoms. A union of conjunctive queries (UCQ) is a disjunction of CQs, which we will see as a set of CQs.

Given sets of atoms A and B, a homomorphism h from A to B is a substitution of $vars (A)$ by $terms (B)$ such that $h (A) \subseteq B$ . We say that A is mapped to B by h. If there is a homomorphism from A to B, we say that A is more general than B, which is denoted $A \geq B$ .

Given a fact F and a BCQ Q, the answer to Q in F is positive if $F ⊨ Q$ . It is well-known that $F ⊨ Q$ if and only if there is a homomorphism from Q to F. If Q is a non-Boolean CQ, let $x_{1} \dots x_{q}$ be the free variables in Q. Then, a tuple of constants $(a_{1} \dots a_{q})$ is an answer to Q in F if there is a homomorphism from Q to F that maps $x_{i}$ to $a_{i}$ for each i. In other words, $(a_{1} \dots a_{q})$ is an answer to Q in F if and only if the answer to the BCQ obtained from Q by substituting each $x_{i}$ with $a_{i}$ is positive.

In this paper, we consider only Boolean queries for simplicity reasons. This is not a restriction, since our mechanisms can actually process a CQ with free variables $x_{1} \dots x_{q}$ by translating it into a BCQ with an added atom $ans (x_{1} \dots x_{q})$ , where $ans$ is a special predicate not occurring in the knowledge base. Since $ans$ can never be erased by a rewriting step, the $x_{i}$ can only be substituted and will not “disappear”. We can thus compute the rewriting set of a CQ as a Boolean CQ with a special $ans$ atom, then transform the rewritings into non-Boolean CQs by removing the $ans$ atom and consider its arguments as free variables. Note that our the generic algorithm can accept as input a union of conjunctive queries as well, since it works exactly in the same way if it takes as input a set of CQs instead of a single CQ.

Definition 1 (Existential rule).

An existential rule (or simply a rule) is a formula $R = \forall \vec{x} \forall \vec{y} (B [\vec{x}, \vec{y}] \to \exists \vec{z} H [\vec{y}, \vec{z}])$ , where $\vec{x}, \vec{y}$ and $\vec{z}$ are tuple of variables, $B = body (R)$ and $H = head (R)$ are conjunctions of atoms, resp. called the body and the head of R. The frontier of R, denoted by $fr (R)$ , is the set $vars (B) \cap vars (H) = \vec{y}$ . The set of existential variables in R is the set $vars (H) ∖ fr (R) = \vec{z}$ .

In the following, we omit quantifiers in rules and queries, as there is no ambiguity. For instance, the rule $R = \forall x (q (x) \to \exists y p (x, y))$ from Example 2 will be written $q (x) \to p (x, y)$ .

A knowledge base (KB) $K = (F, R)$ is composed of a fact F and a finite set of existential rules $R$ . The BCQ entailment problem takes as input a KB $K = (F, R)$ and a BCQ Q, and asks if $F, R ⊨ Q$ holds.

3. Desirable properties of rewriting sets

Given a query Q and a set of existential rules $R$ , rewriting techniques compute a set of queries $Q$ , which we call a rewriting set hereafter. It is generally desired that such a set satisfies at least three properties: soundness, completeness and minimality.

Definition 2 (Sound and complete set).

Let $R$ be a set of existential rules and Q be a BCQ. Let $Q$ be a set of BCQs. $Q$ is said to be sound w.r.t. Q and $R$ if for all facts F, for all $Q^{'} \in Q$ , if $Q^{'}$ can be mapped to F then $F, R ⊨ Q$ . Reciprocally, $Q$ is said to be complete w.r.t. Q and $R$ if for all facts F, if $F, R ⊨ Q$ then there is $Q^{'} \in Q$ such that $Q^{'}$ can be mapped to F.

We mentioned in the introduction that only the most general elements of a rewriting set need to be considered. Indeed, let $Q_{1}$ and $Q_{2}$ be two elements of a rewriting set such that $Q_{1} \geq Q_{2}$ . Then, for any fact F, the set of answers to $Q_{2}$ in F is included in the set of answers to $Q_{1}$ in F. Hence, removing $Q_{2}$ will not undermine completeness (and it will not undermine soundness either). The output of a rewriting algorithm should thus be a minimal set of incomparable queries that “covers” the set of all the sound rewritings of the initial query.

Definition 3 (Covering relation).

Let $Q_{1}$ and $Q_{2}$ be two sets of BCQs. $Q_{1}$ covers $Q_{2}$ , which is denoted $Q_{1} \geq Q_{2}$ , if for all $Q_{2} \in Q_{2}$ there is $Q_{1} \in Q_{1}$ with $Q_{1} \geq Q_{2}$ .

Note that covering can also be defined in terms of classical database query containment, i.e., $Q_{1}$ covers $Q_{2}$ if and only if the UCQ $Q_{2}$ is included in the UCQ $Q_{1}$ .

Definition 4 (Minimal set of BCQs, cover).

Let $Q$ be a set of BCQs. $Q$ is said to be minimal if there is no $Q \in Q$ such that $(Q ∖ {Q}) \geq Q$ . A cover of $Q$ is a minimal set $Q^{c} \subseteq Q$ such that $Q^{c} \geq Q$ .

Since a cover is a minimal set, its elements are pairwise incomparable.

Fig. 2.

Cover (Example 3).

Example 3.

Let $Q = {Q_{1}, \dots, Q_{6}}$ and consider the following preorder over $Q$ : $Q_{1} \geq Q_{2}, Q_{4}, Q_{5}, Q_{6}$ ; $Q_{2} \geq Q_{1}, Q_{4}, Q_{5}, Q_{6}$ ; $Q_{3} \geq Q_{4}$ ; $Q_{5} \geq Q_{6}$ (note that $Q_{1}$ and $Q_{2}$ are equivalent). There are two covers of $Q$ , namely ${Q_{1}, Q_{3}}$ and ${Q_{2}, Q_{3}}$ . See Fig. 2.

A set of (sound) rewritings may have a finite cover even when it is infinite, as illustrated by Example 4.

Example 4.

Let $Q = t (u)$ , $R_{1} = t (x) \land p (x, y) \to r (y)$ , $R_{2} = r (x) \land p (x, y) \to t (y)$ . $R_{1}$ and $R_{2}$ have a head restricted to a single atom and no existential variable, hence the classical most general unifier can be used, which unifies the first atom in the query with the atom of a rule head. The rewriting set of Q with ${R_{1}, R_{2}}$ is infinite. The first generated queries are the following (note that rule variables are renamed when needed): $\begin{array}{rcl} Q_{0} = t (u) \\ Q_{1} = r (x) \land p (x, y) // from Q_{0} and R_{2} with {(u, y)} \\ Q_{2} = t (x_{0}) \land p (x_{0}, y_{0}) \land p (y_{0}, y) \\ // from Q_{1} and R_{1} with {(x, y_{0})} \\ Q_{3} = r (x_{1}) \land p (x_{1}, y_{1}) \land p (y_{1}, y_{0}) \land p (y_{0}, y) \\ // from Q_{2} and R_{2} with {(x_{0}, y_{1})} \\ Q_{4} = t (x_{2}) \land p (x_{2}, y_{2}) \land p (y_{2}, y_{1}) \land p (y_{1}, y_{0}) \\ \land p (y_{0}, y) // from Q_{3} and R_{1} \\ and so on … \end{array}$ However, the set of the most general rewritings is ${Q_{0}, Q_{1}}$ since any other query that can be obtained is more specific than $Q_{0}$ or $Q_{1}$ .

It can be easily checked that all covers of a given set have the same cardinality. We now prove that this property can be extended to the covers of all sound and complete finite rewriting sets of Q, irrespective of the rewriting technique used to compute these sets.

Theorem 1.

Let $R$ be a set of rules and Q be a BCQ. Any finite cover of a sound and complete rewriting set of Q with $R$ is of minimal cardinality (among all sound and complete rewriting sets of Q).

Proof.

Let $Q_{1}$ and $Q_{2}$ be two arbitrary sound and complete rewriting sets of Q with $R$ . Let $Q_{1}^{c}$ (resp. $Q_{2}^{c}$ ) be one of the finite covers of $Q_{1}$ (resp. $Q_{2}$ ). $Q_{1}^{c}$ (resp. $Q_{2}^{c}$ ) is also sound and complete, as well as smaller than or equal to $Q_{1}$ (resp. $Q_{2}$ ). We show that they have the same cardinality. Let $Q_{1} \in Q_{1}^{c}$ . There exists $Q_{2} \in Q_{2}^{c}$ such that $Q_{2} \geq Q_{1}$ . If not, Q would be entailed by $F = Q_{1}$ and $R$ since $Q_{1}^{c}$ is a sound rewriting set of Q (and $Q_{1}$ maps to itself), but no elements of $Q_{2}^{c}$ would map to F: thus, $Q_{2}^{c}$ would not be complete. Similarly, there exists $Q_{1}^{'} \in Q_{1}^{c}$ such that $Q_{1}^{'} \geq Q_{2}$ . Then $Q_{1}^{'} \geq Q_{1}$ , which implies that $Q_{1}^{'} = Q_{1}$ by assumption on $Q_{1}^{c}$ . For all $Q_{1} \in Q_{1}^{c}$ , there exists $Q_{2} \in Q_{2}^{c}$ such that $Q_{2} \geq Q_{1}$ and $Q_{1} \geq Q_{2}$ . Such a $Q_{2}$ is unique: indeed, two such elements would be comparable for ≥, which is not possible by construction of $Q_{2}^{c}$ . The function associating $Q_{2}$ with $Q_{1}$ is thus a bijection from $Q_{1}^{c}$ to $Q_{2}^{c}$ , which shows that these two sets have the same cardinality. □

Furthermore, the proof of the preceding theorem shows that, given any two sound and complete rewriting sets of Q, there is a bijection from any cover of the first set to any cover of the second set such that two elements in relation by the bijection are equivalent. However, these elements are not necessarily isomorphic (i.e., equal up to a variable renaming) because they may contain redundancies. Consider the preorder induced by homomorphism on the set of all BCQs definable on some vocabulary. It is well-known that this preorder is such that any of its equivalence classes possesses a unique element of minimal size (up to isomorphism), called its core (notion introduced for graphs,4

⁴

See for instance [17], where the notion of a core is traced back to the late sixties.

but easily transferable to queries).

Every query can be transformed into its equivalent core by removing redundant atoms. We recall that a set of existential rules ensuring that a finite sound and complete set of most general rewritings exists for any query is called a finite unification set (fus).5

⁵

The notion of a finite unification set was first introduced in [1] and defined with respect to piece-unifiers. However, since piece-unifiers provide a sound and complete rewriting operator (see Section 5) and all the covers of a given set have the same cardinality, the two definitions are equivalent.

By the remark above and Theorem 1, we obtain:

Corollary 2.

Let $R$ be a fus and Q be a BCQ. There is a unique finite sound and complete rewriting set of Q with $R$ that has both minimal cardinality and elements of minimal size.

4. A generic rewriting algorithm

We will now present a generic rewriting algorithm that takes as input a set of existential rules and a query, and as parameter a rewriting operator. The studied question is the following: which properties should this operator satisfy in order that the algorithm outputs a sound, complete, finite and minimal set?

4.1. Rewriting algorithm

Definition 5 (Rewriting operator).

A rewriting operator rew is a function which takes as input a BCQ Q and a set of rules $R$ and outputs a set of BCQs denoted by $rew (Q, R)$ .

Since the elements of $rew (Q, R)$ are BCQs, it is possible to apply further steps of rewriting to them. This naturally leads to the notions of k-rewriting and k-saturation.

Definition 6 (k-rewriting).

Let Q be a conjunctive query, $R$ be a set of rules and $rew$ be a rewriting operator. A 1-rewriting of Q (w.r.t. $rew$ and $R$ ) is an element of $rew (Q, R)$ . A k-rewriting of Q, for $k > 1$ , (w.r.t. $rew$ and $R$ ) is a 1-rewriting of a $(k - 1)$ -rewriting of Q.

The term k-saturation is convenient to name the set of queries that can be obtained in at most k rewriting steps.

Definition 7 (k-saturation).

Let Q be a BCQ, $R$ be a set of rules and $rew$ be a rewriting operator. We denote the set of k-rewritings of Q by ${rew}_{k} (Q, R)$ . We call k-saturation, and denote by $W_{k} (Q, R)$ , the set of i-rewritings of Q for all $i \leq k$ . We denote $W_{\infty} (Q, R) = ⋃_{k \in N} W_{k} (Q, R)$ .

In the following, we extend the notations $rew, {rew}_{k}$ and $W_{k}$ to a set of BCQs $Q$ instead of a single BCQ Q: $rew (Q, R) = ⋃_{Q \in Q} rew (Q, R)$ , ${rew}_{k} (Q, R) = ⋃_{Q \in Q} {rew}_{k} (Q, R)$ and $W_{k} (Q, R) = ⋃_{i \leq k} {rew}_{i} (Q, R)$ .

Algorithm 1 performs a breadth-first exploration of the rewriting space of a given query.6

⁶
Note that a depth-first exploration would not ensure termination for fus rules.

At each step, only the most general elements are kept thanks to a covering function, denoted by cover , that computes a cover of a given set.

Algorithm 1:

A generic rewriting algorithm

For termination reasons (see the proof of Property 6), already explored queries are preferred to non-explored queries in the computation of the cover. More precisely, if both $Q_{c} \cup {q}$ and $Q_{c} \cup {q^{'}}$ are covers of $Q_{F} \cup rew (Q_{E}, R)$ , with q and $q^{'}$ homomorphically equivalent and ${q}$ belongs to $Q_{F}$ , then cover does not output $Q_{c} \cup {q^{'}}$ . If rew fulfills some good properties (specified below), then, after the ith iteration of the while loop, the i-saturation of Q (with respect to $R$ and $rew$ ) is covered by $Q_{F}$ , while $Q_{E}$ contains the queries that remain to be explored.

In the remainder of this section, we study the conditions that a rewriting operator must meet in order that: (i) the algorithm halts and outputs a cover of all the rewritings that can be obtained with this rewriting operator, provided that such a finite cover exists; (ii) the output cover is sound and complete.

4.2. Correctness and termination of the algorithm

We now define a property on the rewriting operator, called prunability. This property is sufficient to ensure that Algorithm 1 outputs a cover of $W_{\infty} (Q, R)$ . Intuitively, if an operator is prunable then, for every $Q_{1}$ more general than $Q_{2}$ , the one-step rewritings of $Q_{2}$ are covered by the one-step rewritings of $Q_{1}$ or by $Q_{1}$ itself. It follows that all the rewritings of $Q_{2}$ are covered by $Q_{1}$ and its rewritings. Hence, $Q_{2}$ can be safely removed from the current rewriting set.

Definition 8 (Prunable).

A rewriting operator $rew$ is said to be prunable if for any set of rules $R$ and for all BCQs $Q_{1}, Q_{2}, Q_{2}^{'}$ such that $Q_{1} \geq Q_{2}$ , $Q_{2}^{'} \in rew (Q_{2}, R)$ and $Q_{1} ≱ Q_{2}^{'}$ , there is $Q_{1}^{'} \in rew (Q_{1}, R)$ such that $Q_{1}^{'} \geq Q_{2}^{'}$ .

The following lemma states that this can be generalized to k-rewritings for any k.

Lemma 3.
Let $rew$ be a prunable rewriting operator, and let $Q_{1}$ and $Q_{2}$ be two sets of BCQs. If $Q_{1} \geq Q_{2}$ , then $W_{\infty} (Q_{1}, R) \geq W_{\infty} (Q_{2}, R)$ .
Proof.
We prove by induction on i that $W_{i} (Q_{1}, R) \geq {rew}_{i} (Q_{2}, R)$ .

For $i = 0$ , $W_{0} (Q_{1}, R) = Q_{1} \geq Q_{2} = {rew}_{0} (Q_{2}, R)$ .

For $i > 0$ , for any $Q_{2} \in {rew}_{i} (Q_{2}, R)$ , there is $Q_{2}^{'} \in {rew}_{i - 1} (Q_{2}, R)$ such that $Q_{2} \in rew (Q_{2}^{'}, R)$ . By induction hypothesis, there is $Q_{1}^{'} \in W_{i - 1} (Q_{1}, R)$ such that $Q_{1}^{'} \geq Q_{2}^{'}$ . $rew$ is prunable, thus either $Q_{1}^{'} \geq Q_{2}$ or there is $Q_{1} \in rew (Q_{1}^{'}, R)$ such that $Q_{1} \geq Q_{2}$ . Since $W_{i - 1} (Q_{1}, R)$ and $rew (Q_{1}^{'}, R)$ are both included in $W_{i} (Q_{1}, R)$ , we can conclude. □

This lemma would not be sufficient to prove the correctness of Algorithm 1, as will be discussed in Section 6.1. We need a stronger version, which checks that a query whose 1-rewritings are covered needs not to be explored.
Lemma 4.
Let $rew$ be a prunable rewriting operator, and let $Q_{1}$ and $Q_{2}$ be two sets of BCQs. If $(Q_{1} \cup Q_{2}) \geq rew (Q_{1}, R)$ , then $(Q_{1} \cup W_{\infty} (Q_{2}, R)) \geq W_{\infty} (Q_{1} \cup Q_{2}, R)$ .
Proof.
We prove by induction on i that $Q_{1} \cup W_{i} (Q_{2}, R) \geq {rew}_{i} (Q_{1} \cup Q_{2}, R)$ .

For $i = 0$ , ${rew}_{0} (Q_{1} \cup Q_{2}, R) = Q_{1} \cup Q_{2} = Q_{1} \cup W_{0} (Q_{2}, R)$ .

For $i > 0$ , for any $Q_{i} \in {rew}_{i} (Q_{1} \cup Q_{2}, R)$ , there is $Q_{i - 1} \in {rew}_{i - 1} (Q_{1} \cup Q_{2}, R)$ such that $Q_{i} \in rew (Q_{i - 1}, R)$ . By induction hypothesis, there is $Q_{i - 1}^{'} \in Q_{1} \cup W_{i - 1} (Q_{2}, R)$ such that $Q_{i - 1}^{'} \geq Q_{i - 1}$ . Since $rew$ is prunable, either $Q_{i - 1}^{'} \geq Q_{i}$ or there is $Q_{i}^{'} \in rew (Q_{i - 1}^{'}, R)$ such that $Q_{i}^{'} \geq Q_{i}$ . Then, there are two possibilities:
either $Q_{i - 1}^{'} \in Q_{1}$ : since $Q_{1} \cup Q_{2} \geq rew (Q_{1}, R)$ , we have $Q_{1} \cup Q_{2} \geq {Q_{i}^{'}}$ and so $Q_{1} \cup W_{i} (Q_{2}, R) \geq {Q_{i}^{'}}$ ,

or $Q_{i - 1}^{'} \in W_{i - 1} (Q_{2}, R)$ : then $Q_{i}^{'} \in W_{i} (Q_{2}, R)$ . □

Finally, the correctness of Algorithm 1 is based on the following loop invariants.
Property 5 (Invariants of Algorithm 1).

Let $rew$ be a rewriting operator. After each iteration of the while loop of Algorithm 1, the following properties hold:

$Q_{E} \subseteq Q_{F} \subseteq W_{\infty} (Q, R)$ ;

$Q_{F} \geq rew (Q_{F} ∖ Q_{E}, R)$ ;

if rew is prunable then $(Q_{F} \cup W_{\infty} (Q_{E}, R)) \geq W_{\infty} (Q, R)$ ;

for all distinct $Q, Q^{'} \in Q_{F}$ , $Q ≱ Q^{'}$ and $Q^{'} ≱ Q$ .

Proof.
Invariants are proved by induction on the number of iterations of the while loop. Below $Q_{F}^{i}$ and $Q_{E}^{i}$ denote the value of $Q_{F}$ and $Q_{E}$ after i iterations. Invariant 1:
$Q_{E} \subseteq Q_{F} \subseteq W_{\infty} (Q, R)$ . basis:
$Q_{E}^{0} = Q_{F}^{0} = {Q} = W_{0} (Q, R) \subseteq W_{\infty} (Q, R)$ .
induction step:
by construction, $Q_{E}^{i} \subseteq Q_{F}^{i}$ and $Q_{F}^{i} \subseteq Q_{F}^{i - 1} \cup rew (Q_{E}^{i - 1}, R)$ . For any $Q^{'} \in Q_{F}^{i}$ we have: either $Q^{'} \in Q_{F}^{i - 1}$ and then by induction hypothesis $Q^{'} \in W_{\infty} (Q, R)$ ; or $Q^{'} \in rew (Q_{E}^{i - 1}, R)$ and then by induction hypothesis we have $Q_{E}^{i - 1} \subseteq W_{\infty} (Q, R)$ , which implies $Q^{'} \in W_{\infty} (Q, R)$ .

Invariant 2:
$Q_{F} \geq rew (Q_{F} ∖ Q_{E}, R)$ . basis:
$rew (Q_{F}^{0} ∖ Q_{E}^{0}, R) = rew (\emptyset, R) = \emptyset$ and any set covers it.
induction step:
by construction, $Q_{F}^{i} \geq Q_{F}^{i - 1} \cup rew (Q_{E}^{i - 1}, R)$ ; since by induction hypothesis $Q_{F}^{i - 1} \geq rew (Q_{F}^{i - 1} ∖ Q_{E}^{i - 1}, R)$ , we have $Q_{F}^{i} \geq rew (Q_{F}^{i - 1} ∖ Q_{E}^{i - 1}, R) \cup rew (Q_{E}^{i - 1}, R) = rew (Q_{F}^{i - 1}, R)$ . Furthermore, by construction, $Q_{E}^{i} = Q_{F}^{i} ∖ Q_{F}^{i - 1}$ ; thus $Q_{F}^{i} ∖ Q_{E}^{i} \subseteq Q_{F}^{i - 1}$ and so $rew (Q_{F}^{i} ∖ Q_{E}^{i}, R) \subseteq rew (Q_{F}^{i - 1}, R)$ . Thus $Q_{F}^{i} \geq rew (Q_{F}^{i} ∖ Q_{E}^{i}, R)$ .

Invariant 3:
if rew is prunable then $(Q_{F} \cup W_{\infty} (Q_{E}, R)) \geq W_{\infty} (Q, R)$ . basis:
$(Q_{F}^{0} \cup W_{\infty} (Q_{E}^{0}, R)) = ({Q} \cup W_{\infty} ({Q}, R)) = W_{\infty} (Q, R)$ .
induction step:
we first show that (i): $(Q_{F}^{i} \cup W_{\infty} (Q_{E}^{i}, R)) \geq W_{\infty} (Q_{F}^{i}, R)$ , then we prove by induction that (ii): $W_{\infty} (Q_{F}^{i}, R) \geq W_{\infty} (Q, R)$ :
by construction $Q_{E}^{i} \subseteq Q_{F}^{i}$ , thus $(Q_{F}^{i} ∖ Q_{E}^{i}) \cup Q_{E}^{i} = Q_{F}^{i}$ , and by Invariant 2, we have $(Q_{F}^{i} ∖ Q_{E}^{i}) \cup Q_{E}^{i} \geq rew (Q_{F}^{i} ∖ Q_{E}^{i}, R)$ . Lemma 4 then entails that $((Q_{F}^{i} ∖ Q_{E}^{i}) \cup W_{\infty} (Q_{E}^{i}, R)) \geq W_{\infty} ((Q_{F}^{i} ∖ Q_{E}^{i}) \cup Q_{E}^{i}, R)$ and we can conclude since $Q_{F}^{i} = (Q_{F}^{i} ∖ Q_{E}^{i}) \cup Q_{E}^{i}$ ,

by construction, we have $Q_{F}^{i} \geq Q_{F}^{i - 1} \cup rew (Q_{E}^{i - 1}, R)$ ; so, by Lemma 3, we have $W_{\infty} (Q_{F}^{i}, R) \geq W_{\infty} (Q_{F}^{i - 1} \cup rew (Q_{E}^{i - 1}, R), R) = W_{\infty} (Q_{F}^{i - 1}, R) \cup W_{\infty} (rew (Q_{E}^{i - 1}, R), R)$ . Moreover, $Q_{E}^{i - 1} \subseteq Q_{F}^{i - 1} \subseteq W_{\infty} (Q_{F}^{i - 1}, R)$ , thus $W_{\infty} (Q_{F}^{i}, R) \geq Q_{F}^{i - 1} \cup Q_{E}^{i - 1} \cup W_{\infty} (rew (Q_{E}^{i - 1}, R), R) = Q_{F}^{i - 1} \cup W_{\infty} (Q_{E}^{i - 1}, R)$ . Using (i), we have $W_{\infty} (Q_{F}^{i}, R) \geq W_{\infty} (Q_{F}^{i - 1}, R)$ and conclude by induction hypothesis.

Invariant 4:
for all distinct $Q, Q^{'} \in Q_{F}$ , $Q ≱ Q^{'}$ and $Q^{'} ≱ Q$ . Trivially satisfied thanks to the properties of cover . □

The next property states that if rew is prunable then Algorithm 1 halts for each case where $W_{\infty} (Q, R)$ has a finite cover. Property 6.
Let rew be a rewriting operator, $R$ be a set of rules and Q be a BCQ. If $W_{\infty} (Q, R)$ has a finite cover and rew is prunable then Algorithm 1 halts.
Proof.
Let $Q$ be a finite cover of $W_{\infty} (Q, R)$ and let m be the largest k for a k-rewriting in $Q$ .

We thus have $W_{m} (Q, R) \geq Q \geq W_{\infty} (Q, R)$ . Since the operator is prunable, we have $Q_{F}^{i} \geq W_{i} (Q, R)$ for all $i \geq 0$ (proved with a straightforward induction on i). Thus $Q_{F}^{m} \geq W_{\infty} (Q, R)$ . Thus, $rew (Q_{E}^{m}, R)$ is covered by $Q_{F}^{m}$ , and since already explored queries are taken first for the computation of a cover, we have that $Q_{E}^{m + 1} = \emptyset$ . Hence Algorithm 1 halts. □
Theorem 7.
Let rew be a rewriting operator, $R$ be a set of rules and Q be a BCQ. If $W_{\infty} (Q, R)$ has a finite cover and rew is prunable then Algorithm 1 outputs this cover (up to query equivalence).
Proof.
By Property 6, Algorithm 1 halts. By Invariant 3 from Property 5, $(Q_{F}^{f} \cup W_{\infty} (Q_{E}^{f}, R)) \geq W_{\infty} (Q, R)$ where $Q_{F}^{f}$ and $Q_{E}^{f}$ denote the final values of $Q_{F}$ and $Q_{E}$ in Algorithm 1. Since $Q_{E}^{f} = \emptyset$ when Algorithm 1 halts, we have $Q_{F}^{f} \geq W_{\infty} (Q, R)$ . Thanks to Invariants 1 and 4 from Property 5 we conclude that $Q_{F}^{f}$ is a cover of $W_{\infty} (Q, R)$ . □

4.3. Preserving soundness and completeness

We consider two further properties of a rewriting operator, namely soundness and completeness, with the aim of ensuring the soundness and completeness of the obtained rewriting set within the meaning of Definition 2.

Definition 9 (Soundness/completeness).

Let $rew$ be a rewriting operator. $rew$ is sound if for any set of rules $R$ , for any BCQ Q, for any $Q^{'} \in rew (Q, R)$ , for any fact F, $F ⊨ Q^{'}$ implies that $F, R ⊨ Q$ . $rew$ is complete if for any set of rules $R$ , for any BCQ Q, for any fact F such that $F, R ⊨ Q$ , there exists $Q^{'} \in W_{\infty} (Q, R)$ such that $F ⊨ Q^{'}$ .

Property 8.
If $rew$ is sound, then the output of Algorithm 1 is a sound rewriting set of Q and $R$ .
Proof.
Direct consequence of Invariant 1 from Property 5. □

Perhaps surprisingly, the completeness of the rewriting operator is not sufficient to ensure the completeness of the output rewriting set. Examples are provided in Section 6.1. This is due to the dynamic pruning performed at each step of Algorithm 1. Therefore the prunability of the operator is also required.
Property 9.
If $rew$ is prunable and complete, then the output of Algorithm 1 is a complete rewriting set of Q and $R$ .
Proof.
Algorithm 1 returns $Q_{F}$ when $Q_{E}$ is empty. By Invariant 3 of Property 5, we know that $(Q_{F} \cup W_{\infty} (\emptyset, R)) \geq W_{\infty} (Q, R)$ . Since $W_{\infty} (\emptyset, R)) = \emptyset$ , we obtain that $Q_{F} \geq W_{\infty} (Q, R)$ . □

Finally, as stated by the next theorem, when the rewriting operator is sound, complete and prunable, Algorithm 1 is correct and terminates for any finite unification set of rules. We remind that expressive classes of fus rules are known (see the introduction). In particular, the main members of DL-Lite family are generalized by the simple class of linear existential rules. See also Section 7 for examples of such ontologies. Theorem 10.
If rew is a sound, complete and prunable operator, and $R$ is a finite unification set of rules, then for any BCQ Q, Algorithm 1 outputs a minimal (finite) sound and complete rewriting set of Q with $R$ .
Proof.
If $R$ is a fus and rew is a sound and complete operator then $W_{\infty} (Q, R)$ has a finite cover. The claim then follows from Properties 8 and 9 and Theorem 7. □

5. Piece-based rewriting

As mentioned in the introduction (and illustrated in Example 2), existential variables in rule heads induce a structure that has to be taken into account in the rewriting mechanism. Hence the classical notion of a unifier is replaced by that of a piece-unifier [2]. A piece-unifier “unifies” a subset $Q^{'}$ of Q with a subset $H^{'}$ of $head (R)$ , in the sense that the associated substitution u is such that $u (Q^{'}) = u (H^{'})$ . Given a piece-unifier, Q is partitioned into “pieces”, which are minimal subsets of atoms that must be processed together. More specifically, the cutpoints are the variables from $Q^{'}$ that are not unified with existential variables from $H^{'}$ (i.e., they are unified with frontier variables or constants); then a piece in Q is a minimal non-empty subset of atoms “glued” by variables other than cutpoints, i.e., connected by a path of variables that are not cutpoints. We recall below the definition of pieces given in [2] (where T corresponds to the set of cutpoints).

Definition 10 (Piece[2]).

Let A be a set of atoms and $T \subseteq vars (A)$ . A piece of A according to T is a minimal non-empty subset P of A such that, for all a and $a^{'}$ in A, if $a \in P$ and $(vars (a) \cap vars (a^{'})) ⊈ T$ , then $a^{'} \in P$ .

In this paper, we give a definition of a piece-unifier based on partitions rather than substitutions, which simplifies subsequent proofs. For any substitution u from a set ofvariables $E_{1}$ to a set of terms $E_{2}$ associated with a piece-unifier, it holds that $E_{1} \cap E_{2} = \emptyset$ . Thus, u can be associated with a partition $P_{u}$ of $E_{1} \cup E_{2}$ such that two terms are in the same class of $P_{u}$ if and only if they are merged by u; more specifically, we consider the equivalence classes of the symmetric, reflexive and transitive closure of the following relation ∼: $t \sim t^{'}$ if $u (t) = t^{'}$ . Conversely, given a partition on a set of terms E, such that no class contains two constants, we can consider a substitution u obtained by selecting an element of each class with priority given to constants: if ${e_{1} \dots e_{k}}$ is a class in the partition and $e_{i}$ is a selected element, then for all $e_{j}$ with $1 \leq j \neq i \leq k$ , we set $u (e_{j}) = e_{i}$ . If we consider a total order on terms, such that constants are smaller than variables, then a unique substitution is obtained by taking the smallest element in each class. We call admissible partition a partition such that no class contains two constants.

The set of all partitions over a given set is structured in a lattice by the “finer than” relation (given two partitions $P_{1}$ and $P_{2}$ , $P_{1}$ is finer than $P_{2}$ , denoted by $P_{1} \geq P_{2}$ , if every class of $P_{1}$ is included in a class of $P_{2}$ ).7

⁷
Usually, the notation ≤ is used to denote the relation “finer than”. We adopt the converse convention, which is more in line with substitutions and the ≥ preorder on CQs.

The join of several partitions is the partition obtained by making the union of their non-disjoint classes. The join of two admissible partitions may be a non-admissible partition. We say that several admissible partitions are compatible if their join is an admissible partition. Note that if the concerned partitions are relative to the same set E, then their join is their greatest lower bound in the partition lattice of E.

The following property makes a link between comparable partitions and comparable substitutions.

Property 11.

Let $P_{1}$ and $P_{2}$ be two admissible partitions over the same set such that $P_{1} \geq P_{2}$ , with associated substitutions $u_{1}$ and $u_{2}$ respectively. Then there is a substitution s such that $u_{2} = s \circ u_{1}$ (i.e., $u_{1}$ is “more general” than $u_{2}$ ).

Proof.

The substitution s is built as follows: for any class $C_{i} \in P_{1}$ , let $C_{j} \in P_{2}$ be the class such that $C_{i} \subseteq C_{j}$ . Let $e_{i}$ (resp. $e_{j}$ ) be the selected element in $C_{i}$ (resp. $C_{j}$ ); if $e_{i} \neq e_{j}$ (in this case, $e_{i}$ is necessarily a variable), then $s (e_{i}) = e_{j}$ . It can be immediately checked that $u_{2} = s \circ u_{1}$ . □

In the following definition of a piece-unifier, we assume that Q and R have disjoint sets of variables. Given $Q^{'} \subseteq Q$ , we call separating variables from $Q^{'}$ , and denote by $sep (Q^{'})$ , the variables occurring in both $Q^{'}$ and $(Q ∖ Q^{'})$ : $sep (Q^{'}) = vars (Q^{'}) \cap vars (Q ∖ Q^{'})$ .

Definition 11 (Piece-unifier, cutpoint).

A piece-unifier of Q with R is a triple $μ = (Q^{'}, H^{'}, P_{u})$ , where $Q^{'} \neq \emptyset$ , $Q^{'} \subseteq Q$ , $H^{'} \subseteq head (R)$ and $P_{u}$ is a partition on $terms (Q^{'}) \cup terms (H^{'})$ satisfying the following three conditions:

$P_{u}$ is admissible;

if a class in $P_{u}$ contains an existential variable (from $H^{'}$ ) then the other terms in the class are non-separating variables from $Q^{'}$ ;

$u (H^{'}) = u (Q^{'})$ , where u is a substitution associated with $P_{u}$ .

The cutpoints of μ, denoted by

cutp (μ)

, are the variables from

Q^{'}

that are not unified with existential variables from

H^{'}

(i.e., they are unified with frontier variables or constants):

cutp (μ) = {x \in vars (Q^{'}) | u (x) \in fr (R) \cup consts (Q^{'}) \cup consts (H^{'})}

Condition 2 in the piece-unifier definition ensures that a separating variable in $Q^{'}$ is necessarily a cutpoint. It follows that $Q^{'}$ is composed of pieces: indeed, an existential variable from $H^{'}$ is necessarily unified with a non-separating variable from $Q^{'}$ , say x, which ensures that all atoms from $Q^{'}$ in which x occurs are also part of $Q^{'}$ . Figure 3 illustrates these notions.

Fig. 3.

Piece-unifier.

We provide below some examples of piece-unifiers.

Example 5.

Let $R = q (x) \to p (x, y)$ and $Q = p (u, v) \land p (w, v) \land p (w, t) \land r (u, w)$ . Let $H^{'} = {p (x, y)}$ . They are three piece-unifiers of Q with R:

$μ_{1} = (Q_{1}^{'}, H^{'}, P_{u}^{1})$ with $Q_{1}^{'} = {p (u, v), p (w, v)}$ and $P_{u}^{1} = {{x, u, w}, {y, v}}$

$μ_{2} = (Q_{2}^{'}, H^{'}, P_{u}^{2})$ with $Q_{2}^{'} = {p (w, t)}$ and $P_{u}^{2} = {{x, w}, {y, t}}$

$μ_{3} = (Q_{3}^{'}, H^{'}, P_{u}^{3})$ with $Q_{3}^{'} = {p (u, v), p (w, v), p (w, t)}$ and $P_{u}^{3} = {{x, u, w}, {y, v, t}}$

Note that

Q_{1}^{'}

and

Q_{2}^{'}

are each composed of a single piece;

Q_{3}^{'} = Q_{1}^{'} \cup Q_{2}^{'}

and

P_{u}^{3}

is the join of

P_{u}^{1}

and

P_{u}^{2}

In the previous example, R has an atomic head, thus a piece-unifier of $Q^{'}$ with R actually unifiesthe atoms from $Q^{'}$ and the head of R into a single atom. In the general case, a piece-unifier unifies $Q^{'}$ and a subset $H^{'}$ of $head (R)$ into a set of atoms, as illustrated by the next example.

Example 6.

Let $R = q (x) \to p (x, y) \land p (y, z) \land p (z, t) \land r (y)$ and $Q = p (u, v) \land p (v, w) \land r (u)$ . A piece-unifier of Q with R is $μ_{1} = (Q_{1}^{'}, H_{1}^{'}, P_{u}^{1})$ with $Q_{1}^{'} = {p (u, v), p (v, w)}$ , $H_{1}^{'} = {p (x, y), p (y, z)}$ and $P_{u}^{1} = {{x, u}, {v, y}, {w, z}}$ . Another piece-unifier is $μ_{2} = (Q_{2}^{'}, H_{2}^{'}, P_{u}^{2})$ with $Q_{2}^{'} = Q$ , $H_{2}^{'} = {p (y, z), p (z, t), r (y)}$ and $P_{u}^{2} = {{u, y}, {v, z}, {w, t}}$ .

Note that $μ_{3} = (Q_{3}^{'}, H_{3}^{'}, P_{u}^{3})$ with $Q_{3}^{'} = {p (u, v)}$ , $H_{3}^{'} = {p (x, y)}$ and $P_{u}^{3} = {{x, u}, {v, y}}$ is not a piece-unifier because the second condition in the definition of piece-unifier is not fulfilled: v is a separating variable and is matched with the existential variable y.

Then, the notions of a one-step rewriting according to a piece-unifier, and of a rewriting obtained by a sequence of one-step rewritings, are defined in the natural way.

Definition 12 (One-step piece-rewriting).

Given a piece-unifier $μ = (Q^{'}, H^{'}, P_{u})$ of Q with R, the one-step piece-rewriting of Q according to μ, denoted by $β (Q, R, μ)$ , is the BCQ $u (body (R)) \cup u (Q ∖ Q^{'})$ , where u is a substitution associated with $P_{u}$ .

We thus define inductively a k-step piece-rewriting as a $(k - 1)$ -step piece rewriting of a one-step piece-rewriting. For any k, a k-step piece-rewriting of Q is a piece-rewriting of Q.

The next theorem states that piece-based rewriting is logically sound and complete.

Theorem 12 ([2,30]).

Let $K = (F, R)$ be a KB and Q be a BCQ. Then $F, R ⊨ Q$ iff there is a piece-rewriting $Q^{'}$ of Q such that $Q^{'} \geq F$ .

It follows from Theorem 12 that a sound and complete rewriting operator can be based on piece-unifiers: we call piece-based rewriting operator, the rewriting operator that, given Q and $R$ , outputs all the one-step piece-rewritings of Q according to a piece-unifier of Q with $R \in R$ . We denote it by $β (Q, R)$ .

Actually, as detailed hereafter, only most general piece-unifiers are to be considered, since the other piece-unifiers produce more specific queries.

Definition 13 (Most general piece-unifier).

Given two piece-unifiers defined on the same subsets of a query and a rule head, $μ_{1} = (Q^{'}, H^{'}, P_{u}^{1})$ and $μ_{2} = (Q^{'}, H^{'}, P_{u}^{2})$ , we say that $μ_{1}$ is more general than $μ_{2}$ (notation $μ_{1} \geq μ_{2}$ ) if $P_{u}^{1}$ is finer than $P_{u}^{2}$ (i.e., $P_{u}^{1} \geq P_{u}^{2}$ ). A piece-unifier $μ = (Q^{'}, H^{'}, P_{u})$ is called a most general piece-unifier if it is more general than all the piece-unifiers on $Q^{'}$ and $H^{'}$ .

Property 13.
Let $μ_{1}$ and $μ_{2}$ be two piece-unifiers with $μ_{1} \geq μ_{2}$ . Then $μ_{1}$ and $μ_{2}$ have the same pieces.
Proof.
$μ_{1}$ and $μ_{2}$ have the same pieces iff they have the same cutpoints. It holds that $cutp (μ_{1}) \subseteq cutp (μ_{2})$ since every class from $P_{u}^{1}$ is included in a class from $P_{u}^{2}$ : hence a variable from $Q^{'}$ that is in the same class as a frontier variable or a constant in $P_{u}^{1}$ also is in $P_{u}^{2}$ . It remains to prove that $cutp (μ_{2}) \subseteq cutp (μ_{1})$ . Let x be a cutpoint of $μ_{2}$ and $P_{u}^{2} (x)$ be the class of x in $P_{u}^{2}$ . Since x is a cutpoint of $μ_{2}$ , there is a term t in $P_{u}^{2} (x)$ that is a constant or a frontier variable. Since $P_{u}^{1} \geq P_{u}^{2}$ , we know that $P_{u}^{1} (x) \subseteq P_{u}^{2} (x)$ . Let $t^{'}$ be a term of $H^{'}$ from $P_{u}^{1} (x)$ (there is at least one term of $H^{'}$ and one term of $Q^{'}$ in each class since the partition is part of a unifier of $H^{'}$ and $Q^{'}$ ). We are sure that $t^{'}$ is not an existential variable because $t^{'} \in P_{u}^{2} (x)$ and an existential variable cannot be in the same class as t (Condition 2 in the definition of a piece-unifier), so $t^{'}$ is a frontier variable or a constant, hence x is a cutpoint of $μ_{1}$ . □
Property 14.
Let $μ_{1} = (Q^{'}, H^{'}, P_{u}^{1})$ and $μ_{2} = (Q^{'}, H^{'}, P_{u}^{2})$ be two piece-unifiers such that $μ_{1} \geq μ_{2}$ . Then $β (Q, R, μ_{1}) \geq β (Q, R, μ_{2})$ .
Proof.
Let $u_{1}$ (resp. $u_{2}$ ) be a substitution associated with $P_{u}^{1}$ (resp. $P_{u}^{2}$ ). Since $P_{u}^{1} \geq P_{u}^{2}$ , there is a substitution s such that $u_{2} = s \circ u_{1}$ . Then $β (Q, R, μ_{2}) = u_{2} (body (R)) \cup u_{2} (Q ∖ Q^{'})$ $= (s \circ u_{1}) (body (R)) \cup (s \circ u_{1}) (Q ∖ Q^{'}) = (s \circ u_{1}) (body (R) \cup (Q ∖ Q^{'})) = s (u_{1} (body (R) \cup (Q ∖ Q^{'}))) = s (β (Q, R, μ_{1}))$ . s is thus a homomorphism from $β (Q, R, μ_{1})$ to $β (Q, R, μ_{2})$ , hence $β (Q, R, μ_{1}) \geq β (Q, R, μ_{2})$ . □

The following lemma expresses that piece-based rewriting operator is prunable.
Lemma 15.
If $Q_{1} \geq Q_{2}$ then for any piece-unifier $μ_{2}$ of $Q_{2}$ with R: either (i) $Q_{1} \geq β (Q_{2}, R, μ_{2})$ or (ii) there is a piece-unifier $μ_{1}$ of $Q_{1}$ with R such that $β (Q_{1}, R, μ_{1}) \geq β (Q_{2}, R, μ_{2})$ .
Proof.
Let h be a homomorphism from $Q_{1}$ to $Q_{2}$ . Let $μ_{2} = (Q_{2}^{'}, H_{2}^{'}, P_{u}^{2})$ be a piece-unifier of $Q_{2}$ with R, and let $u_{2}$ be a substitution associated with $P_{u}^{2}$ . We consider two cases:
If $h (Q_{1}) \subseteq (Q_{2} ∖ Q_{2}^{'})$ , then $u_{2} \circ h$ is a homomorphism from $Q_{1}$ to $u_{2} (Q_{2} ∖ Q_{2}^{'}) \subseteq β (Q_{2}, R, μ_{2})$ . Thus $Q_{1} \geq β (Q_{2}, R, μ_{2})$ .

Otherwise, let $Q_{1}^{'}$ be the non-empty subset of $Q_{1}$ mapped by h to $Q_{2}^{'}$ , i.e., $h (Q_{1}^{'}) \subseteq Q_{2}^{'}$ , and $H_{1}^{'}$ be the subset of $H_{2}^{'}$ matched by $u_{2}$ with $u_{2} (h (Q_{1}^{'}))$ , i.e., $u_{2} (H_{1}^{'}) = u_{2} (h (Q^{'} 1))$ . Let $P_{u}^{1}$ be the partition on $terms (H_{1}^{'}) \cup terms (Q_{1}^{'})$ such that two terms are in the same class of $P_{u}^{1}$ if these terms or their images by h are in the same class of $P_{u}^{2}$ (i.e., for a term t, we consider t if t is in $Q_{1}^{'}$ , and $h (t)$ otherwise). By construction, $(Q_{1}^{'}, H_{1}^{'}, P_{u}^{1})$ is a piece-unifier of $Q_{1}$ with R. Indeed, $P_{u}^{1}$ fulfills all the conditions of the piece-unifier definition since $P_{u}^{2}$ fulfills these conditions.

Let $u_{1}$ be a substitution associated with $P_{u}^{1}$ . For each class P of $P_{u}^{1}$ (resp. $P_{u}^{2}$ ), we call selected element the unique element t of P such that $u_{1} (t) = t$ (resp. $u_{2} (t) = t$ ). We build a substitution s, from the selected elements of the classes in $P_{u}^{1}$ which are variables, to the selected elements of the classes in $P_{u}^{2}$ , as follows: for any class P of $P_{u}^{1}$ , let t be the selected element of P: if t is a variable of $H_{1}^{'}$ then $s (t) = u_{2} (t)$ , otherwise $s (t) = u_{2} (h (t))$ (t occurs in $Q_{1}^{'}$ ). Note that, for any term t in $P_{u}^{1}$ , we have $s (u_{1} (t)) = u_{2} (h (t))$ .

We build now a substitution $h^{'}$ from $vars (β (Q_{1}, R, μ_{1}))$ to $terms (β (Q_{2}, R, μ_{2}))$ , by considering three cases according to the part of $β (Q_{1}, R, μ_{1})$ in which the variable occurs, i.e., in $Q_{1}$ but not in $Q_{1}^{'}$ , in $body (R)$ but not in $H_{1}^{'}$ , or in the remaining part corresponding to the images of $sep (Q_{1}^{'})$ by $u_{1}$ :
if $x \in vars (Q_{1}) ∖ vars (Q_{1}^{'})$ , $h^{'} (x) = h (x)$ ;

if $x \in vars (body (R)) ∖ vars (H_{1}^{'})$ , $h^{'} (x) = u_{2} (x)$ ;

if $x \in u_{1} (sep (Q_{1}^{'}))$ (or alternatively $x \in u_{1} (fr (R) \cap vars (H_{1}^{'}))$ ), $h^{'} (x) = s (x)$ .
We conclude by showing that $h^{'}$ is a homomorphism from $β (Q_{1}, R, μ_{1}) = u_{1} (body (R)) \cup u_{1} (Q_{1} ∖ Q_{1}^{'})$ to $β (Q_{2}, R, μ_{2}) = u_{2} (body (R)) \cup u_{2} (Q_{2} ∖ Q_{2}^{'})$ with two points:

$h^{'} (u_{1} (body (R))) = u_{2} (body (R))$ . Indeed, for any variable x of $body (R)$ :

either $x \in vars (body (R)) ∖ vars (H_{1}^{'})$ , hence $h^{'} (u_{1} (x)) = h^{'} (x) = u_{2} (x)$ ( $u_{1}$ is a substitution from variables of $Q_{1}^{'} \cup H_{1}^{'}$ ),

or $x \in fr (R) \cap vars (H_{1}^{'})$ , hence $h^{'} (u_{1} (x)) = s (u_{1} (x)) = u_{2} (h (x)) = u_{2} (x)$ (h is a substitution from variables of $Q_{1}$ ).

$h^{'} (u_{1} (Q_{1} ∖ Q_{1}^{'})) \subseteq u_{2} (Q_{2} ∖ Q_{2}^{'})$ . We show that $h^{'} (u_{1} (Q_{1} ∖ Q_{1}^{'})) = u_{2} (h (Q_{1} ∖ Q_{1}^{'})))$ , and since $h (Q_{1} ∖ Q_{1}^{'}) \subseteq Q_{2} ∖ Q_{2}^{'}$ , we have $h^{'} (u_{1} (Q_{1} ∖ Q_{1}^{'})) \subseteq u_{2} (Q_{2} ∖ Q_{2}^{'})$ . To show that $h^{'} (u_{1} (Q_{1} ∖ Q_{1}^{'})) = u_{2} (h (Q_{1} ∖ Q_{1}^{'})))$ , we point out that, for any variable x from $Q_{1} ∖ Q_{1}^{'}$ :

either $x \in vars (Q_{1}^{'})$ , then $h^{'} (u_{1} (x)) = s (u_{1} (x)) = u_{2} (h (x))$

or $x \in vars (Q_{1}) ∖ vars (Q_{1}^{'})$ , then $h^{'} (u_{1} (x)) = h^{'} (x) = h (x) = u_{2} (h (x))$ ( $u_{1}$ is a substitution from variables of $Q_{1}^{'} \cup H_{1}^{'}$ and $u_{2}$ is a substitution from variables of $Q_{2}^{'} \cup H_{2}^{'}$ and $h (x) \notin vars (Q_{2}^{'} \cup H_{2}^{'})$ ). □

Given a query Q and a set of rules $R$ , the piece-based rewriting operator computes the set of one-step piece-rewritings of Q according to all piece-unifiers of Q with a rule $R \in R$ . We are now able to show that this operator fulfills the desired properties introduced in Section 4.
Theorem 16.
Piece-based rewriting operator is sound, complete and prunable; this property is still true if only most general piece-unifiers are considered.
Proof.
Soundness and completeness follow from Theorem 12. Prunability follows from Lemma 15. Thanks to Property 14, the proof remains true if most general piece-unifiers are considered. □
6. Exploiting single-piece unifiers

We are now interested in the efficient computation of piece-based rewritings. We identify several sources of combinatorial explosion in the computation of the piece-unifiers between a query and a rule:

The problem of deciding whether there is a piece-unifier of a given query Q with a given rule R is NP-complete in the general case. NP-hardness is easily obtained by considering the case of a rule with an empty frontier: then, there is a piece-unifier between Q and R if and only if there is a homomorphism from Q to $H = head (R)$ , which is an NP-complete problem, Q and H being any sets of atoms.

The number of most general piece-unifiers can be exponential in $| Q |$ , even if the rule head H is restricted to a single atom. For instance, assume that each atom of Q unifies with H and forms its own piece; then there may be $2^{| Q |}$ piece-unifiers obtained by considering all subsets of Q.

The same atom in Q may belong to distinct pieces according to distinct unifiers, as illustrated by the next example.

Example 7.
Let $Q = r (u, v) \land q (v)$ and $R = p (x) \to r (x, y) \land r (y, x) \land q (y)$ . Atom $r (u, v)$ belongs to two single-piece unifiers: $({r (u, v), q (v)}, {r (x, y), q (y)}, {{u, x}, {v, y}})$ and $({r (u, v)}, {r (y, x)}, {{u, y}, {v, x}})$ . For an additional example, see Example 6, where $p (u, v)$ and $p (v, w)$ both belong to $μ_{1}$ and $μ_{2}$ .

To cope with this complexity, an idea is to rely on single-piece unifiers, i.e., piece-unifiers of the form $(Q^{'}, -, -)$ where $Q^{'}$ is a single piece of Q. This section is devoted to the properties of rewriting operators exploiting this notion. We show that the rewriting operator based on single-piece (most general) unifiers is sound and complete. However, perhaps surprisingly, it is not prunable, which prevents to use it in the generic algorithm. To recover prunability, we will define the aggregation of single-piece unifiers, which provides us with a new rewriting operator, which has all the desired properties and generates rewriting sets with fewer components than the standard piece-unifier. Note, however, that this will not completely remove the second complexity source (i.e., the exponential number of unifiers to consider) since the number of aggregations of single-piece unifiers can still be exponential in the size of Q, even with atomic-head rules.
6.1. Single-piece based operator

As expressed by the following theorem, (most general) single-piece unifiers provide a sound and complete operator.

Theorem 17.
Given a BCQ Q and a set of rules $R$ , the set of rewritings of Q obtained by considering exclusively most general single-piece unifiers is sound and complete.
Proof.
See Appendix. □

The proof of this theorem is given in Appendix since it is not reused hereafter. Indeed, the restriction to single-piece unifiers is not compatible with selecting most general rewritings at each step, as performed in Algorithm 1. We present below some examples that illustrate this incompatibility.
Example 8 (Basic example).

Let $Q = p (y, z) \land p (z, y)$ and $R = r (x, x) \to p (x, x)$ . There are two single-piece unifiers of Q with R, $μ_{1} = ({p (y, z)}, {p (x, x)}, {{x, y, z}})$ and $μ_{2} = ({p (z, y)}, {p (x, x)}, {{x, y, z}})$ , which yield the same rewriting, e.g. $Q_{1} = r (x, x) \land p (x, x)$ . There is also a two-piece unifier $μ = (Q, {p (x, x)}, {{x, y, z}})$ , which yields e.g. $Q^{'} = r (x, x)$ . A query equivalent to $Q^{'}$ can be obtained from $Q_{1}$ by a further single-piece unification. Now, assume that we restrict unifiers to single-piece unifiers and keep most general rewritings at each step. Since $Q \geq Q_{1}$ , $Q_{1}$ is not kept, hence $Q^{'}$ will never be generated, whereas it is incomparable with Q.

Concerning the preceding example, given $u_{1}$ and $u_{2}$ the substitutions respectively associated with $μ_{1}$ and $μ_{2}$ , one may argue that $u_{1} (Q)$ is redundant and the same holds for $u_{2} (Q)$ ; hence, the problem would be solved by computing $u_{1} (Q) ∖ u_{1} (Q^{'})$ instead of $u_{1} (Q ∖ Q^{'})$ and making $u_{1} (Q)$ non-redundant (i.e., equal to $p (x, x)$ ) before computing $u_{1} (Q) ∖ u_{1} (Q^{'})$ , which would then be empty. However, the problem goes deeper, as illustrated by the next two examples.

Fig. 4.

The queries in Example 10.

Example 9 (Ternary predicates).

Let $Q = r (u, v, w) \land r (w, t, u)$ and $R = p (x, y) \to r (x, y, x)$ . Again, there are two single-piece unifiers of Q with R: $μ_{1} = ({r (u, v, w)}, {r (x, y, x)}, {{u, w, x}, {v, y}})$ and $μ_{2} = ({r (w, t, u)}, {r (x, y, x)}, {{u, w, x}, {t, y}})$ . One obtains two rewritings more specific than Q, e.g., $Q_{1} = p (x, y) \land r (x, v, x)$ , and $Q_{2} = p (x, y) \land r (x, t, x)$ , which are isomorphic. There is also a two-piece unifier $(Q, {r (x, y, x)}, {{u, w, x}, {v, t, y}})$ , which yields $p (x, y)$ . If we remove $Q_{1}$ and $Q_{2}$ , no query equivalent to $p (x, y)$ can be generated.

Example 10 (Very simple rule).

This example has two interesting characteristics: (1) it uses unary/binary predicates only (2) it uses a very simple rule expressible with any lightweight description logic, i.e., a linear existential rule where no variable appears twice in the head or the body. Let $Q = r (u, v) \land r (v, w) \land p (u, z) \land p (v, z) \land p (v, t) \land p (w, t) \land p_{1} (u) \land p_{2} (w)$ (see Fig. 4) and $R = b (x) \to p (x, y)$ . Note that Q is not redundant. There are two single-piece unifiers of Q with R, say $μ_{1}$ and $μ_{2}$ , with pieces $Q_{1}^{'} = {p (u, z), p (v, z)}$ and $Q_{2}^{'} = {p (v, t), p (w, t)}$ respectively. The obtained queries are pictured in Fig. 4. These queries are both more specific than Q. The removal would prevent the generation of a query equivalent to $r (x, x) \land p_{1} (x) \land p_{2} (x) \land b (x)$ , which could be generated from Q with a two-piece unifier.

Property 18.
The single-piece-based operator is not prunable.
Proof.
Follows from the above examples. □

By Theorem 5 and Property 24, one can show that the conclusion of Lemma 3 (Section 4.2) is valid for single-piece unifiers, even though they are not prunable. This justifies that Lemma 3 is not enough to prove the correctness of Algorithm 1.

Nevertheless, single-piece unifiers can still be used as an algorithmic brick to compute more complex piece-unifiers, as shown in the next subsection.
6.2. Aggregated-piece based operator

We first explain the ideas that underline aggregated single-piece unifiers. Let us consider the set of single-piece unifiers naturally associated with a piece-unifier μ. If we successively apply each of these underlying single-piece unifiers, we may obtain a CQ strictly more general than $β (Q, R, μ)$ , as illustrated by the next example.

Example 11.
Let $R = p (x, y) \to q (x, y)$ and $Q = q (u, v) \land r (v, w) \land q (t, w)$ . Let $μ = (Q^{'}, H^{'}, P_{u})$ be a piece-unifier of Q with R with $Q^{'} = {q (u, v), q (t, w)}$ , $H^{'} = {q (x, y)}$ and $P_{u} = {{u, t, x}, {v, w, y}}$ . $β (Q, R, μ) = p (x, y) \land r (y, y)$ . $Q^{'}$ has two pieces w.r.t. μ: $P_{1} = {q (u, v)}$ and $P_{2} = {q (t, w)}$ . If we successively compute the rewritings with the underlying single-piece unifiers $μ_{P_{1}}$ and $μ_{P_{2}}$ , we obtain $Q_{s} = β (β (Q, R, μ_{P_{1}}), R, μ_{P_{2}}) = β (p (x, y) \land r (y, w) \land q (t, w), R, μ_{P_{2}}) = p (x, y) \land r (y, y^{'}) \land p (x^{'}, y^{'})$ , which is strictly more general than $β (Q, R, μ)$ .

Given a set $U$ of “compatible” single-piece unifiers of a query Q with a rule (the notion of “compatible” will be formally defined below), we can thus distinguish between the usual piece-unifier performed on the union of the pieces from the unifiers in $U$ and an “aggregated unifier” that would correspond to a sequence of applications of the unifiers in $U$ . This latter unifier is more interesting than the piece-unifier because, as illustrated by Example 11, it avoids generating some rewritings which are too specific. We will thus rely on the aggregation of single-piece unifiers to recover prunability.

Note that, in this paper, we combine single-piece unifiers of the same rule whereas in [20] we consider the possibility of combining unifiers of distinct rules (and thus compute rewritings from distinct rules in a single step). We keep below the definitions introduced in [20], while pointing out that, in the context of this paper, the rules $R_{1} \dots R_{k}$ in the definitions are necessarily copies of the same rule R. Intuitively, an aggregated unifier of R is a piece-unifier of a new rule built by aggregating copies of R (as formally expressed by next Property 19).
Definition 14 (Aggregation of a set of rules).

Let $R = {R_{1} \dots R_{k}}$ be a set of rules, with pairwise disjoint sets of variables. The aggregation of $R$ , denoted by $R_{1} ◇ \dots ◇ R_{k}$ , is the rule $body (R_{1}) \land \dots \land body (R_{k}) \to head (R_{1}) \land \dots \land head (R_{k})$ .

Definition 15 (Compatible set of piece-unifiers).

Let $U = {μ_{1} = (Q_{1}^{'}, H_{1}^{'}, P_{1}) \dots μ_{k} = (Q_{k}^{'}, H_{k}^{'}, P_{k})}$ be a set of piece-unifiers of Q with rules $R_{1} \dots R_{k}$ respectively, where the rules have pairwise disjoint sets of variables (in particular, for all $1 \leq i, j \leq k, i \neq j$ , it holds that $vars (H_{i}^{'}) \cap vars (H_{j}^{'}) = \emptyset$ ). Set $U$ is said to be compatible if (1) all $Q_{i}^{'}$ and $Q_{j}^{'}$ are pairwise disjoint; (2) the join of $P_{1} \dots P_{k}$ is admissible.

Definition 16 (Aggregated unifier).

Let $U = {μ_{1} = (Q_{1}^{'}, H_{1}^{'}, P_{1}) \dots μ_{k} = (Q_{k}^{'}, H_{k}^{'}, P_{k})}$ be a compatible set of piece-unifiers of Q with rules $R_{1} \dots R_{k}$ . An aggregated unifier of Q with $R_{1} \dots R_{k}$ w.r.t. $U$ is $μ = (Q^{'}, H^{'}, P)$ where: (1) $Q^{'} = Q_{1}^{'} \cup \dots \cup Q_{k}^{'}$ ; (2) $H^{'} = H_{1}^{'} \cup \dots \cup H_{k}^{'}$ ; (3) P is the join of $P_{1} \dots P_{k}$ . It is said to be single-piece if all the piece-unifiers in $U$ are single-piece. It is said to be most general if all the piece-unifiers in $U$ are most general.

Property 19.
Let Q be a BCQ and $U = {μ_{1} = (Q_{1}^{'}, H_{1}^{'}, P_{1}) \dots μ_{k} = (Q_{k}^{'}, H_{k}^{'}, P_{k})}$ be a compatible set of piece-unifiers of Q with $R_{1} \dots R_{k}$ . Then, the aggregated unifier of $U$ is a piece-unifier of Q with the aggregation of ${R_{1} \dots R_{k}}$ .
Proof.
We show that the aggregated unifier $μ = (Q^{'}, H^{'}, P_{u})$ of $U$ satisfies the conditions of the definition of a piece-unifier (Definition 11). Condition 1 is fulfilled since, by definition of compatibility, the join of $P_{1} \dots P_{k}$ is admissible. Condition 2 is satisfied as well, because, since $P_{1} \dots P_{k}$ satisfy it, so does their join. Indeed, if a class contains an existential variable, it cannot be merged with another by aggregation because its other terms are non-separating variables, hence do not appear in other classes. Concerning the last condition, for all $1 \leq i \leq k$ , we have $u_{i} (H_{i}^{'}) = u_{i} (Q_{i}^{'})$ , where $u_{i}$ is a substitutionassociated with $P_{i}$ . Since $Q^{'} = ⋃_{i = 1}^{k} Q_{i}^{'}$ and $H^{'} = ⋃_{i = 1}^{k} H_{i}^{'}$ we know that, for any substitution u associated with $P_{u}$ , we have $u (H^{'}) = u (Q^{'})$ . □

According to this property, the rewriting associated with an aggregated unifier μ can be defined as $β (Q, R_{1} ◇ \dots ◇ R_{k}, μ)$ . It corresponds to the rewriting obtained by applying the piece-unifiers associated with the $R_{i}$ one after the other, as illustrated by the next example.
Example 12.
Consider again Example 11. Let $R^{'} = p (x^{'}, y^{'}) \to q (x^{'}, y^{'})$ be a copy of R. The aggregation $R ◇ R^{'}$ is the rule $p (x, y) \land p (x^{'}, y^{'}) \to q (x, y) \land q (x^{'}, y^{'})$ . Let $U = {μ_{P_{1}}, μ_{P_{2}}}$ where $μ_{P_{1}} = ({q (u, v)}, {q (x, y)}, {{u, x}, {v, y}})$ and $μ_{P_{2}} = ({q (t, w)}, {q (x^{'}, y^{'})}, {{t, x^{'}}, {w, y^{'}}})$ . The aggregated unifier of Q with $R, R^{'}$ w.r.t. $U$ is $({q (u, v), q (t, w)}, {q (x, y), q (x^{'}, y^{'})}, {{u, x}, {v, y}, {t, x^{'}}, {w, y^{'}}})$ . The associated rewriting of Q is $p (x, y) \land r (y, y^{'}) \land p (x^{'}, y^{'})$ , which is equal to the rewriting $Q_{s}$ in Example 11.

The difference between a piece-unifier and an aggregated unifier of Q with R can also be explained as follows: to build a piece-unifier of Q with R, we consider partitions of $terms (Q) \cup terms (head (R))$ , while in the aggregation operation we consider partitions of $terms (Q) \cup ⋃_{i = 1}^{k} terms (head (R_{i}))$ , where k is the number of considered single-piece unifiers, and each $R_{i}$ is safely renamed from R. In other words, if, in the definition of an aggregated unifier, we assumed that the $R_{1} \dots R_{k}$ had been exactly R, instead of safely renamed copies of R, then the aggregation of $R_{1} \dots R_{k}$ would have been exactly R after removal of duplicate atoms, and the aggregated unifier would have been the usual piece-unifier.

The next property shows that, from any piece-unifier μ, one can build a most general single-piece aggregated unifier, which produces a rewriting more general than the one produced by μ.
Property 20.
For any piece-unifier μ of Q with R, there is a most general single-piece aggregated unifier $μ_{◇}$ of Q with $R_{1} \dots R_{k}$ copies of R such that $β (Q, R_{1} ◇ \dots ◇ R_{k}, μ_{◇}) \geq β (Q, R, μ)$ .
Proof.
Let $Q_{1}^{'}, \dots, Q_{k}^{'}$ be the pieces of $Q^{'}$ according to $μ = (Q^{'}, H^{'}, P_{u})$ and let u be a substitution associated with $P_{u}$ . Let $R_{1} \dots R_{k}$ be safely renamed copies of R. Let $h_{i}$ denote the variable renaming used to produce $R_{i}$ from R. Let $U = {μ_{1} = (Q_{1}^{'}, H_{1}^{'}, P_{u}^{1}), \dots, μ_{k} = (Q_{k}^{'}, H_{k}^{'}, P_{u}^{k})}$ be a set of piece-unifiers of Q with $R_{1}, \dots, R_{k}$ built as follows for all i:
$H_{i}^{'}$ is the image by $h_{i}$ of the subset of $H^{'}$ unified by u with $Q_{i}^{'}$

let $h_{i} (P_{u})$ be the partition built from $P_{u}$ by replacing each $x \in vars (H^{'})$ by $h_{i} (x)$ ; then, $P_{u}^{i}$ is obtained from $h_{i} (P_{u})$ by (1) restricting it to the terms of $Q_{i}^{'}$ and $H_{i}^{'}$ , and (2) refining it as much as possible while keeping the property that $u_{i} (H_{i}^{'}) = u_{i} (Q_{i}^{'})$ , where $u_{i}$ is a substitution associated with the partition.
For any $μ_{i} = (Q_{i}^{'}, H_{i}^{'}, P_{u}^{i})$ we immediately check that:
$μ_{i}$ is a most general piece-unifier,

$μ_{i}$ is a single-piece unifier,

for all $μ_{j} \in U$ , with $μ_{i} \neq μ_{j}$ , $μ_{j}$ and $μ_{i}$ are compatible.

Let $μ_{◇} = (Q_{◇}^{'}, H_{◇}^{'}, P_{u}^{◇})$ be the aggregated unifier of Q with $R_{1}, \dots, R_{k}$ w.r.t. $U$ . Note that $Q_{◇}^{'} = Q^{'}$ . The above properties fulfilled by any $μ_{i}$ from $U$ ensure that $μ_{◇}$ is a most general single-piece aggregated unifier.

We note $R_{◇} = R_{1} ◇ \dots ◇ R_{k}$ . It remains to prove that $β (Q, R_{◇}, μ_{◇}) \geq β (Q, R, μ)$ . Let $u_{◇}$ be a substitution associated with $P_{u}^{◇}$ . For each class P of $P_{u}$ (resp. $P_{u}^{◇}$ ), we call selected element the unique element t of P such that $u (t) = t$ (resp. $u_{◇} (t) = t$ ).

We build a substitution s, from the selected elements in $P_{u}^{◇}$ which are variables, to the selected elements in $P_{u}$ , as follows: for any class P of $P_{u}^{◇}$ , let t be the selected element of P: if t is a variable of $Q^{'}$ , then $s (t) = u (t)$ ; else t is a variable of a $H_{i}^{'}$ : then $s (t) = u (h_{i}^{- 1} (t))$ . Note that for any term t in $P_{u}^{◇}$ , there is a variable renaming $h_{i}$ such that $s (u_{◇} (t)) = u (h_{i}^{- 1} (t))$ (if t is a constant or a variable from $vars (Q)$ then any $h_{i}$ can be chosen).

We build now a substitution h from $vars (β (Q, R_{◇}, μ_{◇}))$ to $terms (β (Q, R, μ))$ , by considering three cases according to the part of $β (Q, R_{◇}, μ_{◇})$ in which the variable occurs, i.e., in Q but not in $Q^{'}$ , in $body (R_{i})$ but not in $H_{i}^{'}$ , or in the remaining part corresponding to the images of $sep (Q^{'})$ by $u_{◇}$ :
if $x \in vars (Q) ∖ vars (Q^{'})$ , $h (x) = x$ ;

if $x \in vars (body (R_{i})) ∖ vars (H_{i}^{'})$ , $h (x) = h_{i}^{- 1} (x)$ ;

if $x \in u_{◇} (sep (Q^{'}))$ (or alternatively $x \in u_{◇} (fr (R_{◇}) \cap vars (H_{◇}^{'}))$ ), $h (x) = s (x)$ ;

We conclude by showing that h is a homomorphism from $β (Q, R_{◇}, μ_{◇}) = u_{◇} (body (R_{1}) \cup \dots \cup body (R_{k})) \cup u_{◇} (Q ∖ Q^{'})$ to $β (Q, R, μ) = u (body (R)) \cup u (Q ∖ Q^{'})$ , with two points:
for all i, $h (u_{◇} (body (R_{i}))) = u (body (R))$ . Indeed, for any variable $x \in vars (body (R_{i}))$ :

either $x \in vars (body (R_{i})) ∖ vars (H_{i}^{'})$ , hence $h (u_{◇} (x)) = h (x) = h_{i}^{- 1} (x) = u (h_{i}^{- 1} (x))$ (u does not substitute the variables from $vars (body (R)) ∖ vars (H^{'})$ ),

or $x \in fr (R_{i}) \cap vars (H_{i}^{'})$ , hence $h (u_{◇} (x)) = s (u_{◇} (x)) = u (h_{i}^{- 1} (x))$ ;

$h (u_{◇} (Q ∖ Q^{'})) = u (Q ∖ Q^{'})$ . Indeed, for any variable $x \in vars (Q ∖ Q^{'})$ :

either $x \in vars (Q^{'})$ , then $h (u_{◇} (x)) = s (u_{◇} (x)) = u (h_{i}^{- 1} (x)) = u (x)$ ( $h_{i}^{- 1}$ does not substitute the variables from Q),

or $x \in vars (Q) ∖ vars (Q^{'})$ , then $h (u_{◇} (x)) = h (x) = x = u (x)$ ( $u_{◇}$ and u do not substitute the variables from $vars (Q) ∖ vars (Q^{'})$ ). □

We call single-piece aggregator the rewriting operator that computes the set of one-step rewritings of a query Q by considering all the most general single-piece aggregated unifiers of Q. Theorem 21.
The single-piece aggregator is sound, complete and prunable.
Proof.
Soundness comes from Property 19 and from the fact that, for any set of rules $R$ , let R be the aggregation of $R$ , one has $R ⊨ R$ . Completeness and prunability rely on the fact that the piece-based rewriting operator fulfills these properties, and the fact that for any queries Q and $Q^{'}$ and any rule R, if $Q^{'} = β (Q, R, μ)$ , where μ is a piece-unifier, then the query $Q^{″}$ obtained with the single-piece aggregator corresponding to μ is more general than $Q^{'}$ , as expressed by Property 20. □

7. Detailed algorithms and experiments

In this section, we first detail on the computation of all the most general single-piece unifiers of a query Q with a rule R, and explain how we use them to compute all the single-piece aggregators. Then, we focus on the specific case of unification with atomic-head rules, for which the computation is simpler. Last, we report first experiments.

7.1. Computing single-piece unifiers and their aggregation

We first introduce the notion of a pre-unifier, which is weaker than a piece-unifier. To become a piece-unifier, a pre-unifier has to satisfy an additional constraint on the separating variables of the unified subset of Q.

Definition 17 (Valid partition).

Let Q be a BCQ, R be a rule, $Q^{'} \subseteq Q$ , $H^{'} \subseteq head (R)$ , and $P_{u}$ be a partition on $terms (Q^{'}) \cup terms (H^{'})$ . $P_{u}$ is valid if no class of $P_{u}$ contains two constants, or two existential variables of R, or a constant and an existential variable of R, or an existential variable of R and a frontier variable of R.

Definition 18 (Pre-unifier).

Let Q be a BCQ, R be a rule, $Q^{'} \subseteq Q$ , $H^{'} \subseteq head (R)$ , and $P_{u}$ be a partition on $terms (Q^{'}) \cup terms (H^{'})$ . Then $μ = (Q^{'}, H^{'}, P_{u})$ is a pre-unifier of Q with R if (1) $P_{u}$ is valid, and (2) given a substitution u associated with $P_{u}$ , it holds that $u (H^{'}) = u (Q^{'})$ .

The next definition introduces the notion of sticky variables, which are the variables of $Q^{'}$ that prevent $Q^{'}$ to be a piece.

Definition 19 (Sticky variables).

Let Q be a BCQ, R be a rule, $Q^{'} \subseteq Q$ , $H^{'} \subseteq head (R)$ and $P_{u}$ be a partition on $terms (Q^{'}) \cup terms (H^{'})$ . The sticky variables of $Q^{'}$ in $P_{u}$ w.r.t. Q and R, denoted by $sticky (Q^{'}, P_{u})$ , are the separating variables of $Q^{'}$ that occur in a class of $P_{u}$ containing an existential variable of R.

The next property ensures that a pre-unifier without sticky variables is a piece-unifier, and reciprocally. Its proof follows from the definitions.

Property 22.
Let Q be a BCQ, R be a rule, $Q^{'} \subseteq Q$ , $H^{'} \subseteq head (R)$ , and $P_{u}$ be a partition on $terms (Q^{'}) \cup terms (H^{'})$ . Then $μ = (Q^{'}, H^{'}, P_{u})$ is a piece-unifier of Q with R iff μ is a pre-unifier and $sticky (Q^{'}, P_{u}) = \emptyset$ .

The fact that we can first build pre-unifiers, then check the absence of sticky variables, suggests an incremental method to compute all the most general single piece-unifiers of Q with R.

The first step consists in computing all the most general pre-unifiers of an atom $a \in Q$ with an atom $b \in head (R)$ with the same predicate. The partition on the terms of these atoms associated with their unification has to be valid. The next definition defines formally this notion of partition.
Definition 20 (Partition by position).

Let A be a set of atoms with the same predicate p. The partition by position associated with A, denoted by $P_{p} (A)$ , is the partition on $terms (A)$ such that two terms of A occurring in the same position i ( $1 \leq i \leq arity (p)$ ) are in the same class of $P_{p} (A)$ .

Hence, the partition by position associated with ${a, b}$ has to be valid. We denote by $APU$ the set of all the most general atomic pre-unifiers, i.e., $APU = {μ = ({a}, {b}, P) ∣ a \in Q, b \in head (R), and μ is a pre-unifier of Q with R}$ . Algorithm 2 details the computation of $APU$ .

Algorithm 2:

Computation of APU , the set of most general atomic pre-unifiers

Algorithm 3:

Computation of SPU , the set of most general single-piece unifiers

Algorithm 4:

Computation of the most general single-piece unifiers extending a given pre-unifier

We then use $APU$ to compute the set of all the most general single-piece unifiers of Q with R, denoted by $SPU$ . Each atomic pre-unifier of $APU$ is incrementally extended in all possible ways with other atomic pre-unifiers of $APU$ , which contain “missing” atoms of Q with respect to sticky variables. Extending pre-unifier $(Q_{1}, H_{1}, P_{1})$ with pre-unifier $(Q_{2}, H_{2}, P_{2})$ consists in merging both pre-unifiers to obtain a new pre-unifier $(Q_{1} \cup Q_{2}, H_{1} \cup H_{2}, join (P_{1}, P_{2}))$ ; this extension can be performed if and only if the join of $P_{1}$ and $P_{2}$ is a valid partition; if the obtained pre-unifier has no sticky variable, it is a single piece-unifier.

Next Algorithms 3, 4 and 5 detail the computation of $SPU$ . Algorithm 3 is the main algorithm. It first uses Algorithm 2 to compute $APU$ , then, for each atomic pre-unifier $μ \in APU$ , it calls Algorithm 4, which computes the single-piece unifiers extending μ. Algorithm 4 first checks if μ contains sticky variables: if it it is the case, this single-piece unifier is returned, otherwise the algorithm is recursively called, after a call to Algorithm 5 to obtain a set of candidate extensions of μ.

Algorithm 5:

Computation of the pre-unifiers extending a given pre-unifier w.r.t. to a given set of atoms

Finally, the set of all the single-piece aggregators of Q with R is obtained by aggregating the unifiers from all non-empty compatible subsets of $SPU$ . For optimisation reasons, this set is incrementally computed as follows:

Let $U_{1} = SPU = {μ_{1}, \dots, μ_{k}}$ ; the $μ_{i}$ are called 1-unifiers.

For $i = 2$ to the greatest possible rank (i.e., as long as $U_{i}$ is not empty): let $U_{i}$ be the set of all i-unifiers obtained by aggregating an $(i - 1)$ -unifier from $U_{i - 1}$ and a single-piece unifier from $U_{1}$ .

Return the union of all the $U_{i}$ obtained.

7.2. The specific case of atomic-head rules

Rules with an atomic head are often considered in the literature, specifically in logic programming or in deductive databases. One may ask if piece-unification become simpler in this specific case. In fact, considering atomic-head rules does not simplify the definition of a piece-unifier in itself, but its computation. Indeed, there is now a unique way of associating any atom from Q with the head of a rule. It follows that deciding whether there is a piece-unifier of Q with a rule can be done in linear time with respect to the size of Q (whereas it is NP-complete in the general case) and each atom belongs to a single piece, thus the set of all single-piece unifiers of Q with a rule can be computed in polynomial time.

More precisely, if a rule R has an atomic head, then every atom in Q participates in at most one most general single-piece unifier of Q with R (up to bijective variable renaming). This is is a corollary of the next property.

Property 23.
Let R be an atomic-head rule and Q be a BCQ. For any atom $a \in Q$ , there is at most one $Q^{'} \subseteq Q$ such that $a \in Q^{'}$ and $Q^{'}$ is a piece for a piece-unifier of Q with R.
Proof.
We prove by contradiction that two single-piece unifiers cannot share an atom of Q. Assume there are $Q_{1}^{'} \subseteq Q$ and $Q_{2}^{'} \subseteq Q$ such that $Q_{1}^{'} \neq Q_{2}^{'}$ and $Q_{1}^{'} \cap Q_{2}^{'} \neq \emptyset$ , and $μ_{1} = (Q_{1}^{'}, H, P_{u}^{1})$ and $μ_{2} = (Q_{2}^{'}, H, P_{u}^{2})$ two single-piece-unifiers of Q with R, with $H = head (R)$ . Since $Q_{1}^{'} \neq Q_{2}^{'}$ , one has $Q_{1}^{'} ∖ Q_{2}^{'} \neq \emptyset$ or $Q_{2}^{'} ∖ Q_{1}^{'} \neq \emptyset$ . Assume $Q_{1}^{'} ∖ Q_{2}^{'} \neq \emptyset$ . Let $A = Q_{1}^{'} \cap Q_{2}^{'}$ and $B = Q_{1}^{'} ∖ A$ . There is at least one variable $x \in vars (A) \cap vars (B)$ such that there is an existential variable e of $head (R)$ in the class of $P_{u}^{1}$ containing x (otherwise $μ_{1}$ has more than one piece). Since H is atomic, there is a unique way of associating any atom with H, thus the class of $P_{u}^{2}$ containing x contains e as well. It follows that $Q_{2}^{'}$ is not a piece since an atom of A and an atom of B share the variable x unified with an existential variable in $μ_{2}$ , while A is included in $Q_{2}^{'}$ and B is not. □

The fact that an atom from Q participates in at most one most general single-piece unifier allows some algorithmic improvements. Indeed, when a piece-unifier of $Q^{'}$ with $head (R)$ is successfully built, all the atoms of $Q^{'}$ can be removed from the set of atoms to be considered in the computation of the next piece-unifiers. Furthermore, there is a unique way of associating any atom from Q with $head (R)$ , hence there is only one pre-unifier of $Q^{'}$ with $head (R)$ . Algorithm 6 exploits these specific aspects to compute all the single-piece unifiers of a query with an atomic-head rule.
Example 13.
Let $R = q (x) \to p (x, y)$ and $Q = p (u, v) \land p (v, t)$ . Let us start from $p (u, v)$ : this atom is unifiable with $head (R)$ and $p (v, t)$ necessarily belongs to the same piece-unifier (if any) because $v \in sticky ({p (u, v)}, {{u, x}, {v, y}})$ ; indeed, v is in the same class as the existential variable y; however, ${p (u, v), p (v, t)}$ is not unifiable with $head (R)$ because, since v occurs at the first and at the second position of a p atom, x and y should be unified, which is not possible, since y is an existential variable; thus, $p (u, v)$ does not belong to any piece-unifier with R. However, $p (v, t)$ still has to be considered. Let us start from it: $p (v, t)$ is unifiable with $head (R)$ and forms its own piece because sticky $({p (v, t)} {{v, x}, {t, y}})$ is empty; indeed, t is in the same class as the existential variable y, but does not occur in any other atom. Hence, there is a single (most general) piece-unifier of Q with R, namely $({p (v, t)}, {p (x, y)}, {{v, x}, {t, y}})$ .

Algorithm 6:
Computation of all the most general single-piece unifiers in the case of atomic-head rules

It should be noted that any existential rule can be decomposed into an equivalent set of rules with atomic head by introducing a new predicate, which gathers the variables of the original head (e.g. [1,4]). Hence, the restriction to atomic-head rules can be made without loss of expressivity. Now, the question is whether it is more efficient to directly process rules with complex heads, or to decompose them into atomic-head rules and benefit from a simpler computation of piece-unifiers. The experiments reported below clearly show that the former choice is better.
7.3. Experiments and perspectives

The query rewriting algorithm, instantiated with the rewriting operator described in the preceding section, has been implemented in Java. Since benchmarks dedicated to existential rules are not available yet, first experiments were carried out with sets of existential rules obtained by translation from ontologies expressed in the description logic DL-Lite $_{R}$ , namely ADOLENA (A), STOCKEXCHANGE (S), UNIVERSITY (U) and VICODI (V). This benchmark was introduced in [27] and then used in several papers, e.g., [12,15,18,19]. Ontologies A and U contain some rules with multiple heads; the ontologies obtained by decomposing rules into atomic-head rules are respectively known as AX and UX. Additionally, we considered the translation of a larger ontology, the DL-Lite version of OpenGalen28

⁸

http://www.opengalen.org/

(G), which contains more than 50k rules. Each ontology is provided with five handcrafted queries.

In [19], we compared with other systems concerning the size of the output and pointed out that none of the existing systems output a complete set of rewritings. However, beside the fact that these systems have evolved since then, one can argue that the size of the rewriting set should not be a decisive criterion (indeed, assuming that the systems are sound and complete, a minimal rewriting set can be obtained by selecting most general elements, see Theorem 1). Therefore, other criteria have to be taken into account, such as the runtime or the total number of CQs generated during the rewriting process.

Table 1

Impact of rule decomposition

	Time (ms)		Output (#)		Generated (#)
	A	AX	A	AX	A	AX
Q1	170	330	27	41	459	720
Q2	90	4900	50	1431	171	4567
Q3	240	47290	104	4466	316	13838
Q4	440	28620	224	3159	826	14526
Q5	2100	1h36	624	32921	2416	215523
	U	UX	U	UX	U	UX
Q1	0	10	2	5	1	4
Q2	0	0	1	1	105	120
Q3	10	20	4	12	42	155
Q4	1370	4190	2	5	2142	4720
Q5	20	20	10	25	153	351

All tests reported here were performed on a DELL machine with a processor at 3.60 GHz and 16 GB of RAM, with 4 GB allocated to the Java Virtual Machine.

Table 1 reports the behavior of the rewriting algorithm on A vs AX and U vs UX with respect to three parameters: the runtime, the size of the output (number of CQs) and the number of generated CQs. The size of the output for AX and UX is before elimination of queries containing auxiliary predicates. The generated CQs are all the rewritings built during the rewriting process (excluding the initial query and possibly including some multi-occurrences of the same rewritings). We can see that avoiding rule decomposition makes a substantial difference. The gain is particularly striking with

Q_{5}

on A/AX with respect to all three parameters (the runtime is 21 seconds for A and 1 hour and 36 minutes for AX, the size of the output is more than 52 times larger for AX before elimination of useless queries, and the number of generated queries is 89 times larger for AX). Moreover, we point out that only 29/102 rules in A and 5/77 rules in U have multiple heads, with only 2 atoms; we can reasonably expect that the gain increases with the proportion of multiple-head rules and the size of rule heads.

Table 2

Generated queries with the single-piece aggregator

Rules	Query	Output (#)	Generated (#)	Explored (#)	Time (ms)
A	Q1	27	459	74	170
	Q2	50	171	70	90
	Q3	104	316	104	240
	Q4	224	826	256	440
	Q5	624	2416	624	2100
S	Q1	6	9	6	0
	Q2	2	137	23	10
	Q3	4	275	20	40
	Q4	4	450	58	90
	Q5	8	688	44	110
U	Q1	2	1	2	0
	Q2	1	105	32	0
	Q3	4	42	10	10
	Q4	2	2142	556	1370
	Q5	10	153	14	20
V	Q1	15	14	15	0
	Q2	10	9	10	0
	Q3	72	117	72	30
	Q4	185	328	185	110
	Q5	30	59	30	10
G	Q1	2	2	2	10
	Q2	1152	1275	1152	1090
	Q3	488	1514	488	1050
	Q4	147	154	147	30
	Q5	324	908	324	1000

Table 2 presents the size of the output, the number of generated CQs and the number of explored CQs for each ontology (as well as the runtime for information, see also Table 4). Note that, since subsumed rewritings are removed at each step of the breadth-first algorithm, only some of the rewritings generated at a given step are explored at the next step. We can see that the number of generated queries can be large with respect to the cardinality of the output, which is less marked for explored queries.

Table 3

Types of rules in the ontologies

Ontology	Rules (#)	Hierarchical rules (#)
A	102	72
S	52	16
U	77	36
V	222	202
G	50764	26980

Our query rewriting tool is able to process any kind of existential rules. There is of course a price to pay for this expressivity, in terms of complexity of the involved mechanisms and time efficiency. We consider the algorithms presented in this paper as basic versions, which can be further improved in various ways, for instance by processing some specific kinds of rules in a specific way. Let us illustrate this with the example of rules expressing taxonomies. Indeed, a large part of currently available ontologies is actually composed of concept and role hierarchies. See Table 3: 71%, 31%, 47%, 91% and 53 % of the rules in ontologies A, S, U, V and G, respectively, express atomic concept or role inclusions.

We can compile these sets of rules as preorders on predicates. The detailed presentation of how to compute and process these preorders is out of the scope of this paper. Briefly said, the preorders are integrated into the rewriting process, which allows to generate a smaller rewriting set, this set being unfolded at the end to produce the expected UCQ. Our purpose here is just to illustrate the fact that some improvements of the basic version can dramatically decrease the runtime, while still relying on the same fundamental mechanisms. Table 4 allows to compare these two versions: PURE9

⁹

Piece Unification based REwriting.

denotes the basic version of our tool and PURE

_{H}

is the version with compiled hierarchical rules (note that compilation is performed offline, hence the algorithm takes as input the preorder and the non-hierarchical rules).

We also compared to two other query rewriting tools, Nyaya and Rapid. Nyaya is a tool dedicated to UCQ rewriting with linear and sticky existential rules, which implements the techniques presented in [15], in particular an optimization for linear rules (which include DL-Lite ontologies). Table 4 shows that our tool is generally faster on the considered benchmark, even in its basic version, specially on Ontology A. This difference could be due to the fact that Nyaya does not directly process multiple-head rules, hence has to decompose them into atomic-head rules. For the large ontology G, Nyaya seemed to be still in a preprocessing step after several hours. Note that the very latest version of Nyaya includes parallel rewriting, which we did not consider here, since our tool does include this kind of optimization.

As far as we know, Nyaya is the only other tool able to process existential rules beyond lightweight DLs. We think that comparing to DL rewriting tools is not very relevant, since these systems make use of specific features, like predicate arity bounded by two, or the tree-model property. Tools tailored for DL-Lite exploit even further the very specific form of DL-Lite axioms. However, we compared to one of these tools, namely Rapid, to obtain an order of magnitude. Rapid is one of the fastest tools dedicated to DL-Lite ontologies [12]. In Table 4, we can see that Rapid is indeed generally faster than our tool, the difference being less pronounced on the version with rule compilation.

Table 4

Runtime (ms) with several query rewriting tools

Rules	Query	PURE	PURE $_{H}$	Nyaya	Rapid
A	Q1	170	120	1122	18
	Q2	90	40	862	23
	Q3	240	30	2363	34
	Q4	440	200	5557	48
	Q5	2100	440	33206	93
S	Q1	0	0	4	7
	Q2	10	10	4	9
	Q3	40	40	46	13
	Q4	90	20	7	12
	Q5	110	80	8	15
U	Q1	0	0	8	6
	Q2	0	10	4	9
	Q3	10	0	12	7
	Q4	1370	120	6	13
	Q5	20	10	10	15
V	Q1	0	0	13	9
	Q2	0	0	51	5
	Q3	30	0	21	25
	Q4	110	30	28	32
	Q5	10	0	22	26
G	Q1	10	0		5
	Q2	1090	620		74
	Q3	1050	290		59
	Q4	30	10		10
	Q5	1000	110		40

Current work includes processing specific kinds of rules in a specific way, while keeping a system able to process any set of existential rules. Other optimizations could be implemented, such as exploiting dependencies between rules to select the rules to be considered at each step. Moreover, the form of the considered output itself, i.e., a union of conjunctive queries, leads to combinatorial explosion. Considering semi-conjunctive queries instead of conjunctive queries as in [31] can save much with respect to both the running time and the size of the output, without compromising the efficiency of query evaluation; in [31] the piece-based rewriting operator is combined with query factorization techniques. We did not consider generating Datalog queries yet. Finally, further experiments should be performed on more complex ontologies. However, even if slightly more complex ontologies could be obtained by translation from description logics, real-world ontologies that would take advantage of the expressiveness of existential rules, as well as associated queries, are currently lacking.

Footnotes

Acknowledgments

We thank Giorgio Orsi for providing us with rule versions of ontologies A, S, U and V, as well as the version of Nyaya used for the experiments (October 2013 version). This work was partially funded by the ANR project PAGODA (ANR-12-JS02-007-01).

Proof of Theorem 17

To prove the completeness of the single-piece based operator, we first prove the following property:

References

[1]

J.-F.

Baget,

Leclère,

M.-L.

Mugnier and

Salvat, Extending decidable cases for rules with existential variables, in: Proc. of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, Pasadena, California, USA, July 11–17, 2009,

Boutilier, ed., 2009, pp. 677–682.

[2]

J.-F.

Baget,

Leclère,

M.-L.

Mugnier and

Salvat, On rules with existential variables: Walking the decidability line, Artificial Intelligence 175(9–10) (2011), 1620–1654.

[3]

Beeri and

M.Y.

Vardi, The implication problem for data dependencies, in: Proc. of the 8th Colloquium on Automata, Languages and Programming, Acre (Akko), Israel, July 13–17, 1981, Lecture Notes in Computer Science, Vol. 115, Springer, 1981, pp. 73–85.

[4]

Calì,

Gottlob and

Kifer, Taming the infinite chase: Query answering under expressive relational constraints, in: Proc. of the 21st International Workshop on Description Logics (DL2008), Dresden, Germany, May 13–16, 2008, Vol. 353, CEUR-WS.org, 2008.

[5]

Calì,

Gottlob and

Lukasiewicz, A general datalog-based framework for tractable query answering over ontologies, in: Proc. of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, Providence, Rhode Island, USA, June 19–July 1, 2009,

Paredaens and

Su, eds, ACM, 2009, pp. 77–86.

[6]

Calì,

Gottlob and

Lukasiewicz, A general Datalog-based framework for tractable query answering over ontologies, J. Web Sem. 14 (2012), 57–83.

[7]

Calì,

Gottlob and

Pieris, Query answering under non-guarded rules in Datalog+/−, in: Proc. of the Fourth International Conference on Web Reasoning and Rule Systems, RR 2010, Bressanone/Brixen, Italy, September 22–24, 2010,

Hitzler and

Lukasiewicz, eds, Lecture Notes in Computer Science, Vol. 6333, Springer, 2010, pp. 1–17.

[8]

Calì,

Lembo and

Rosati, On the decidability and complexity of query answering over inconsistent and incomplete databases, in: Proc. of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, June 9–12, 2003,

Neven,

Beeri and

Milo, eds, ACM, 2003, pp. 260–271.

[9]

Calvanese,

De Giacomo,

Lembo,

Lenzerini and

Rosati, Tractable reasoning and efficient query answering in description logics: The DL-Lite family, J. Autom. Reasoning 39(3) (2007), 385–429.

10.

[10]

A.K.

Chandra,

H.R.

Lewis and

J.A.

Makowsky, Embedded implicational dependencies and their inference problem, in: Proc. of the 13th Annual ACM Symposium on Theory of Computing, Milwaukee, Wisconsin, USA, May 11–13, 1981, ACM, 1981, pp. 342–354.

11.

[11]

Chein and

M.-L.

Mugnier, Graph-Based Knowledge Representation and Reasoning—Computational Foundations of Conceptual Graphs, Advanced Information and Knowledge Processing, Springer, 2008.

12.

[12]

Chortaras,

Trivela and

G.B.

Stamou, Optimized query rewriting for OWL 2 QL, in: Proc. of the 23rd International Conference on Automated Deduction on Automated Deduction, CADE-23, Wroclaw, Poland, July 31–August 5, 2011,

Bjørner and

Sofronie-Stokkermans, eds, Vol. 6803, Springer, 2011, pp. 192–206.

13.

[13]

Civili and

Rosati, A broad class of first-order rewritable tuple-generating dependencies, in: Proc. of the Second International Workshop on Datalog in Academia and Industry, Datalog 2.0, Vienna, Austria, September 11–13, 2012,

Barceló and

Pichler, eds, Vol. 7494, Springer, 2012, pp. 68–80.

14.

[14]

Cuenca Grau,

Horrocks,

Krötzsch,

Kupke,

Magka,

Motik and

Wang, Acyclicity notions for existential rules and their application to query answering in ontologies, J. Artif. Intell. Res. (JAIR) 47 (2013), 741–808.

15.

[15]

Gottlob,

Orsi and

Pieris, Ontological queries: Rewriting and optimization, in: Proc. of the 27th International Conference on Data Engineering, ICDE 2011, Hannover, Germany, April 11–16, 2011,

Abiteboul,

Böhm,

Koch and

K.-L.

Tan, eds, IEEE Computer Society, 2011, pp. 2–13.

16.

[16]

Gottlob and

Schwentick, Rewriting ontological queries into small nonrecursive datalog programs, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012,

Brewka,

Eiter and

S.A.

McIlraith, eds, AAAI Press, 2012.

17.

[17]

Hell and

Nesetril, The core of a graph, Discrete Mathematics 109(1–3) (1992), 117–126.

18.

[18]

Imprialou,

Stoilos and

Cuenca Grau, Benchmarking ontology-based query rewriting systems, in: Proc. of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, Ontario, Canada, July 22–26, 2012,

Hoffmann and

Selman, eds, AAAI Press, 2012.

19.

[19]

König,

Leclère,

M.-L.

Mugnier and

Thomazo, A Sound and complete backward chaining algorithm for existential rules, in: Proc. of the 6th International Conference on Web Reasoning and Rule Systems, RR 2012, Vienna, Austria, September 10–12, 2012,

Krötzsch and

Straccia, eds, Lecture Notes in Computer Science, Vol. 7497, Springer, 2012, pp. 122–138.

20.

[20]

König,

Leclère,

M.-L.

Mugnier and

Thomazo, On the exploration of the query rewriting space with existential rules, in: Proc. of the 7th International Conference on Web Reasoning and Rule Systems, RR 2013, Mannheim, Germany, July 27–29, 2013,

Faber and

Lembo, eds, Lecture Notes in Computer Science, Vol. 7994, Springer, 2013, pp. 123–137.

21.

[21]

Kontchakov,

Lutz,

Toman,

Wolter and

Zakharyaschev, The combined approach to ontology-based data access, in: Proc. of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Catalonia, Spain, July 16–22, 2011,

Walsh, ed., IJCAI/AAAI, 2011, pp. 2656–2661.

22.

[22]

Krötzsch and

Rudolph, Extending decidable existential rules by joining acyclicity and guardedness, in: Proc. of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Catalonia, Spain, July 16–22, 2011,

Walsh, ed., IJCAI/AAAI, 2011, pp. 963–968.

23.

[23]

Leone,

Manna,

Terracina and

Veltri, Efficiently computable datalog ∃ programs, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012,

Brewka,

Eiter and

S.A.

McIlraith, eds, AAAI Press, 2012.

24.

[24]

Lutz,

Toman and

Wolter, Conjunctive query answering in the description logic ℰℒ using a relational database system, in: Proc. of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, Pasadena, California, USA, July 11–17, 2009,

Boutilier, ed., 2009, pp. 2070–2075.

25.

[25]

M.-L.

Mugnier, Ontological query answering with existential rules, in: Proc. of the 5th International Conference on Web Reasoning and Rule Systems, RR 2011, Galway, Ireland, August 29–30, 2011,

Rudolph and

Gutierrez, eds, Vol. 6902, Springer, 2011, pp. 2–23.

26.

[26]

Orsi and

Pieris, Optimizing query answering under ontological constraints, PVLDB 4(11) (2011), 1004–1015.

27.

[27]

Pérez-Urbina,

Horrocks and

Motik, Efficient query answering for OWL 2, in: Proc. of the 8th International Semantic Web Conference on The Semantic Web, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009,

Bernstein,

D.R.

Karger,

Heath,

Feigenbaum,

Maynard,

Motta and

Thirunarayan, eds, Lecture Notes in Computer Science, Vol. 5823, Springer, 2009, pp. 489–504.

28.

[28]

Rodriguez-Muro and

Calvanese, High performance query answering over DL-Lite ontologies, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, Rome, Italy, June 10–14, 2012,

Brewka,

Eiter and

S.A.

McIlraith, eds, AAAI Press, 2012.

29.

[29]

Rosati and

Almatelli, Improving query answering over DL-Lite ontologies, in: Proc. of the Twelfth International Conference on Principles of Knowledge Representation and Reasoning, KR 2010, Toronto, Ontario, Canada, May 9–13, 2010,

Lin,

Sattler and

Truszczynski, eds, AAAI Press, 2010.

30.

[30]

Salvat and

M.-L.

Mugnier, Sound and complete forward and backward chainingd of graph rules, in: Proc. of the 4th International Conference on Conceptual Structures: Knowledge Representation as Interlingua, ICCS’96, Sydney, Australia, August 19–22, 1996,

P.W.

Eklund,

Ellis and

Mann, eds, Lecture Notes in Computer Science, Vol. 1115, Springer, 1996, pp. 248–262.

31.

[31]

Thomazo, Compact rewriting for existential rules, in: Proc. of the 23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, Beijing, China, August 3–9, 2013,

Rossi, ed., IJCAI/AAAI, 2013.

32.

[32]

Thomazo,

J.-F.

Baget,

M.-L.

Mugnier and

Rudolph, A generic querying algorithm for greedy sets of existential rules, in: Proc. of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning: KR 2012, Rome, Italy, June 10–14, 2012,

Brewka,

Eiter and

S.A.

McIlraith, eds, AAAI Press, 2012.

33.

[33]

Venetis,

Stoilos and

G.B.

Stamou, Query extensions and incremental query rewriting for OWL 2 QL ontologies, J. Data Semantics 3(1) (2014), 1–23.

34.

[34]W3C OWL Working Group, OWL 2 Web Ontology Language: Document Overview, W3C Recommendation, 2009.

Sound,complete and minimal UCQ-rewriting for existential rules

Abstract

Keywords

1. Introduction

3 We generalize the classical notion of a fact in order to take existential variables into account.

3. Desirable properties of rewriting sets

Definition 2 (Sound and complete set).

Definition 3 (Covering relation).

Definition 4 (Minimal set of BCQs, cover).

4.1. Rewriting algorithm

Definition 5 (Rewriting operator).

Definition 6 (k-rewriting).

Definition 7 (k-saturation).

6 Note that a depth-first exploration would not ensure termination for fus rules.

Definition 8 (Prunable).

Definition 9 (Soundness/completeness).

Definition 10 (Piece[2]).

7 Usually, the notation ≤ is used to denote the relation “finer than”. We adopt the converse convention, which is more in line with substitutions and the ≥ preorder on CQs.

Theorem 12 ([2,30]).

Definition 13 (Most general piece-unifier).

Example 10 (Very simple rule).

Definition 15 (Compatible set of piece-unifiers).

Definition 16 (Aggregated unifier).

7.1. Computing single-piece unifiers and their aggregation

Definition 17 (Valid partition).

Definition 18 (Pre-unifier).

Definition 19 (Sticky variables).

Footnotes

Acknowledgments

Proof of Theorem 17

References

³
We generalize the classical notion of a fact in order to take existential variables into account.

⁶
Note that a depth-first exploration would not ensure termination for fus rules.

⁷
Usually, the notation ≤ is used to denote the relation “finer than”. We adopt the converse convention, which is more in line with substitutions and the ≥ preorder on CQs.