Sage Journals: Discover world-class research

Abstract

The increasing number of Knowledge Graphs (KGs) available today calls for powerful query languages that can strike a balance between expressiveness and complexity of query evaluation, and that can be easily integrated into existing query processing infrastructures. We present Extended Property Paths (EPPs), a significant enhancement of Property Paths (PPs), the navigational core included in the SPARQL query language. We introduce the EPPs syntax, which allows to capture in a succinct way a larger class of navigational queries than PPs and other navigational extensions of SPARQL, and provide formal semantics. We describe a translation from non-recursive EPPs (nEPPs) into SPARQL queries and provide novel expressiveness results about the capability of SPARQL sub-languages to express navigational queries. We prove that the language of EPPs is more expressive than that of PPs; using EPPs within SPARQL allows to express things that cannot be expressed when only using PPs. We also study the expressiveness of SPARQL with EPPs in terms of reasoning capabilities. We show that SPARQL with EPPs is expressive enough to capture the main RDFS reasoning functionalities and describe how a query can be rewritten into another query enhanced with reasoning capabilities. We complement our contributions with an implementation of EPPs as the SPARQL-independent iEPPs language and an implementation of the translation of nEPPs into SPARQL queries. What sets our approach apart from previous research on querying KGs is the possibility to evaluate both nEPPs and SPARQL with nEPPs queries under the RDFS entailment regime on existing query processors. We report on an experimental evaluation on a variety of real KGs.

Keywords

SPARQL Property Paths navigational languages query-based reasoning expressive power translation

1. Introduction

Knowledge Graphs (KGs) are becoming crucial in many application scenarios [34]. The Google Knowledge Graph [24], Facebook Open Graph [17], DBpedia [13], Yago [51], and Wikidata [53] are just a few examples. Devising powerful KG query languages that can strike a balance between expressiveness and complexity of query evaluation while at the same time having little impact on existing query processing infrastructures is crucial [4]. There is a large number of KGs encoded in RDF [30], the W3C standard for the publishing of structured data on the Web [16]. To query RDF data, a standard query language, called SPARQL [26,42], has been designed. While an early version of SPARQL did not provide explicit navigational capabilities that are crucial for querying graph-like data, the most recent version (SPARQL 1.1) incorporates Property Paths (PPs). The main goal of PPs is to allow the writing of navigational queries in a more succinct way and support basic transitive closure computations. However, it has been widely recognized that PPs offer very limited expressiveness [7,10,20,46]; notably, PPs lack any form of tests within a path, a feature that can be very useful when dealing with graph data. For example, a query like find my Italian exclusive friends, that is, “my friends that are not friends of any of my friends, and are Italian” requires both path difference and tests. Surprisingly, neither are these features available in PPs nor in any previous navigational extension of SPARQL (e.g., NRE [43]). In this paper we introduce Extended Property Paths (EPPs), a comprehensive language including a set of navigational features to extend the current navigational core of SPARQL. In particular, EPPs integrate features like path conjunction, difference, and repetitions, as well as powerful types of tests. A preliminary description of the language appeared in the proceedings of the AAAI’15 conference [19].

Fig. 1.

An excerpt of a Knowledge Graph taken from DBpedia.

1.1. EPPs by example

We introduce the main features of EPPs by describing a few examples. An excerpt of a KG is given in Fig. 1. Intuitively, an EPP expression defines a binary relation on the nodes of the graph upon which it is evaluated.

Example 1 (Path difference).

Find pairs of cities located in the same country but not in the same region.

Navigational Languages such as Nested Expression (NRE) and PPs cannot express such requests due to the lack of path difference (the result has to exclude cities in the same region). With EPPs, the request can be expressed as follows (the full syntax will be presented in Section 3.1):

The symbol denotes backward navigation from the object to the subject of a triple. Path difference enables to discard from the set of cities in the same country (i.e., ) those that are in the same region (i.e., ). A SPARQL-independent evaluation pattern of the EPP expression1

¹
We provide a detailed algorithm in Section 7.

considers all the bindings of the variable

(representing one of the cities that are wanted) and then evaluates the expression from each binding. The result is the set of bindings for the variable

, representing the other city. From :Rome, the evaluation of the expression

reaches :Florence and :Carrara.

Example 2 (Path conjunction).

Find pairs of cities located in the same country and in the same region.

In this case, path conjunction enables to keep from the set of nodes satisfying the first subexpression those that also satisfy the second one. From :Florence, the evaluation of the expression reaches the cities :Florence and :Carrara.

Example 3 (Tests).

Find pairs of cities governed by the same political party founded before 2010.

denotes a test for the existence of a path whose parameters specify the position in the triple from which the test starts ( denotes the object of the last traversed triple), and a path (in this case ). The path is composed by logical and () of two tests. The first checks the existence of an edge :formationYear and the second, which starts from the object of the last traversed triple (i.e., :formationYear), checks that the value is less than 2010. PPs cannot express the query of Example 3 since they do not have the possibility to check for path existence (i.e., nesting). NREs that have this type of construct cannot check for specific conditions along the path; in particular, in this example we want only parties that have been founded before 2010. Starting from :Rome, the first logical and (via ) of two tests is performed; one checks for the existence of an edge :leaderParty, which leads to :Democratic_Party, while the other (i.e., ) starts from the object of the previous navigational step, that is, the object of (:Rome, :leaderParty, :Democratic_Party). From :Democratic_Party, another logical and (via ) of two tests is evaluated. The first one checks the existence of an edge :formationYear and enables to reach the node 2007; the second, which starts from the object of the previous step (i.e., 2007), checks that the value is <2010; in this case the test succeeds and the evaluation continues from :Democratic_Party by navigating the edge :leaderParty backward and reaching the nodes :Florence and :Rome included in the results.

Composing all the previous features together, we can express a more complex query.

Example 4 (Path Conjunction, Difference and Tests).

Find pairs of cities located in the same country but not in the same region. Such cities must be governed by the same political party, which has been founded before 2010.

From :Rome, the evaluation of the first subexpression, including :country and :region, allows to reach the nodes :Florence and :Carrara. The evaluation of the second part of the path conjunction allows to reach the nodes :Rome and :Florence. From :Rome we reach :Florence.

Example 4 cannot be expressed by using NRE-based languages or PPs. These languages lack both path difference (we want cities in the same country but not in the same region) and conjunction (additionally, they must be governed by the same political party). We have discussed in the previous examples how a SPARQL-independent algorithm can evaluate EPP expressions. However, since our primary goal is to allow powerful navigation queries on existing KGs query processing infrastructures, we devised a translation of non-recursive EPPs into SPARQL. Our approach follows the same reasoning as the translation of non-recursive PPs into SPARQL used by the current SPARQL standard [44]. The advantage of using EPPs to write non-recursive navigational queries instead of writing them directly into SPARQL is that the same request can be expressed more succinctly and without the need to deal with intermediate variables.

Example 5.
The SPARQL query corresponding to the translation of the EPP expression in Example 4 is shown in Fig. 2, where ?v1, ?v2, ?v3 and ?v4 are variables automatically generated by the translation algorithm.

Fig. 2.
SPARQL translation of the EPP expression in Example 4.
Example 6 (Arbitrary Path Length with Tests).

Find cities reachable from :Carrara connected via a path of arbitrary length composed by edges labels :twinned and considering only those cities reachable by a chain of intermediate cities having :population greater than 10000. The EPP expression capturing this request is:

The expression involves arbitrary length paths plus tests. The evaluation checks from the node :Carrara the existence of paths of arbitrary length (denoted by ^∗) where each node reached in the path must satisfy the test . Starting from :Carrara, with a path :twinned of length one, :Grasse is reached. From this node the test is evaluated to check the existence of a triple (:Grasse, :population, n) with n>10000. Since :Grasse passes the test, starting from it the path $:twinned$ is evaluated again reaching $:Murcia$ , which also passes the test and $:Migliarino$ that does not pass the test. The evaluation continues from $:Murcia$ and stops when reaching the node $:Miami$ , which passes the test. Overall, we reach :Carrara, :Grasse, :Murcia, and :Miami.

The EPP expression in Example 6 cannot be translated into a basic SPARQL query because it makes use of the closure operator ^∗ (requiring the evaluation of an a-priori unknown number of times). To give semantics to this kind of EPP expression we introduce the evaluation function $EALP$ (Fig. 7), which extends the function ALP defined for PPs in the SPARQL standard [26]. EPPs also support path repetitions (handled via $EALP$ ), that are a concise way of expressing the union of concatenations of an expression between a min and max number of times.

Example 7 (Path Repetitions).

If restricting the number of repetitions between 1 and 2, the expression in Example 6 can be written as follows:

So far we have presented examples of isolated EPPs expressions. We now consider their usage in SPARQL.

Example 8 (EPPs within SPARQL).

Find pairs of cities (A,B) and their populations such that: (i) A and B are in the same country, but not in the same region; (ii) there exists some transportation from A to B.

Fig. 3.

EPPs used inside SPARQL as for Example 8.

The query in Example 3 allows to obtain the population of the pairs of cities satisfying the EPP expression by introducing two additional patterns, where the variables ?popA and ?popB are bound to population information. When the query is evaluated on the graph reported in Fig. 1 it produces no results; for instance the pair (:Rome, :Florence) is connected by an :airbus that is a kind of :plane, which is a means of :transportation, but there is no edge whose label is :transportation.

The previous example does not take the KG RDFS schema into account. When considering transportation services without specifying the exact type of service, one would be able to actually discover the connection between :Rome and :Florence. This can be achieved by performing sub-property inference according to the RDFS entailment regime. One crucial aspect of EPPs is that they can capture the main RDFS inference types by encoding each inference rule in a prototypical EPP expression (see Section 5.2), with the advantage that the resulting expressions can be translated into SPARQL and evaluated on existing processors (via ALP).

Example 9 (EPPs and Reasoning).

The EPPs in Example 8 can be automatically rewritten into an EPP supporting RDFS reasoning as follows:

SELECT ?cityA ?cityB ?popA ?popB WHERE { ?cityA :population ?popA. ?cityB :population ?popB. { /* BEGIN EPPs pattern */ ?cityA ((:country/^country)~(:region/^:region))& (TP(_p,(rdfs:sp*/rdfs:sp) &&T(_o=:transportation)) ||T(_p=:transportation))))) ?cityB. }} /* END EPPs pattern */

The of this query to SPARQL is reported in Fig. 4. When this query is evaluated on the graph in Fig. 1 it produces (?cityA→:Rome, ?cityB→ :Florence, ?popA→2874034, ?popB→380226).

Fig. 4.

SPARQL translation of Example 9.

1.2. Contributions and organization

The contributions of the paper are both theoretical and practical.

We introduce two languages EPPs and iEPPs to query KGs. They have the same syntax but different semantics; one based on multisets (Section 3.2) and complying with SPARQL, and the other based on sets (Section 7.1).

We perform a study on the feasibility of adding EPPs to SPARQL. To this end, we provide a translation from non-recursive EPPs into SPARQL queries (Section 4). The benefit of our translation is twofold; on one hand, it allows to evaluate nEPPs (a larger class of queries than non-recursive PPs) using existing SPARQL processors; on the other hand, the usage of our translation paves the way toward readily incorporating EPPs in the current SPARQL standard.

Building upon our translation, we also show how a SPARQL query can be rewritten into another SPARQL query that incorporates reasoning capabilities and can be evaluated on existing SPARQL processors (Section 5).

We perform a study on the impact of adding EPP features to SPARQL engines. To this end, we implement the nEPPs to SPARQL translation as an extension of the Jena library and an iEPPs query processor. Both are available on-line.2

²
https://extendedpps.wordpress.com

We perform an extensive experimental evaluation on a variety of real data sets (Section 8).

From a theoretical point of view:

We introduce iEPPs as a SPARQL-independent language and discuss its complexity (Section 7.2).

We report novel expressiveness results about the capability of SPARQL in expressing navigational queries. We show that SPARQL is expressive enough to capture nEPPs (Section 4.2).

We prove that the language of EPPs is more expressive than that of PPs and, as a by-product, that the fragment of SPARQL including EPPs, AND and UNION is more expressive than the fragment of SPARQL including PPs, AND and UNION (Section 6.1).

We provide a novel study about the expressiveness of SPARQL in terms of the main reasoning capabilities of RDFS (defined as ρdf [41]) when considering different navigational cores (Section 6.2). We show that SPARQL is expressive enough to capture ρdf.

The remainder of the paper is organized as follows. We provide some background definitions in Section 2. Section 3 presents the EPPs syntax and semantics. Section 4 formalizes the translation of non-recursive EPPs into SPARQL queries. Section 5 shows how EPPs support reasoning. The expressiveness of EPPs is analyzed in Section 6. The iEPPs language is described in Section 7. The implementation and the evaluation of EPPs and iEPPs are discussed in Section 8. Section 9 discusses related literature. We conclude in Section 10.

2. Preliminaries

In this section we provide some background about RDF, SPARQL and SPARQL property paths. An RDF triple3

³
To simplify the discussion we do not consider blank nodes in this section; we will address this issue later in Section 2.4.

is a tuple of the form

⟨ s, p, o ⟩ \in I \times I \times I \cup L

, where

I

and

L

are countably infinite sets of IRIs and literals respectively. An RDF graph G is a set of triples. The set of terms of an RDF graph (i.e., the set of IRIs and literals appearing in the graph) is denoted by

terms (G)

while

nodes (G)

denotes the set of terms used as a subject or object of a triple. In what follows we will focus on the fragment of SPARQL including the SELECT query form and provide a formalization of its semantics along the lines of Angles and Gutierrez [6] that is faithful to the semantics of the W3C standard.

2.1. Background on SPARQL

Let $V$ be a countably infinite set of variables, such that $V \cap (I \cup L) = \emptyset$ . A (solution) mapping μ is a partial function μ: $V \to I \cup L$ . The empty mapping, denoted $μ_{0}$ , is the mapping satisfying $dom (μ_{0}) = \emptyset$ . Two mappings, say $μ_{1}$ and $μ_{2}$ , are compatible (resp., not compatible), denoted by $μ_{1} \sim μ_{2}$ (resp., $μ_{1} ≁ μ_{2}$ ), if $μ_{1} (? X) = μ_{2} (? X)$ for all variables $? X \in (dom (μ_{1}) \cap dom (μ_{2}))$ (resp., if $μ_{1} (? X) \neq μ_{2} (? X)$ for some $? X \in (dom (μ_{1}) \cap dom (μ_{2}))$ ). If $μ_{1} \sim μ_{2}$ then we write $μ_{1} \cup μ_{2}$ for the mapping obtained by extending $μ_{1}$ according to $μ_{2}$ on all variables in $dom (μ_{2}) ∖ dom (μ_{1})$ . Note that two mappings with disjoint domains are always compatible, and that the empty mapping $μ_{0}$ is compatible with any other mapping. Given a finite set of variables $W \subset V$ , the restriction of a mapping μ to W, denoted $μ_{| W}$ , is a mapping $μ^{'}$ satisfying $dom (μ^{'}) = dom (μ) \cap W$ and $μ^{'} (? X) = μ (? X)$ for every $? X \in dom (μ) \cap W$ .

A selection formula is defined recursively as follows: (i) If $? X, ? Y \in V$ and $c \in I \cup L$ then $(? X = c)$ , $(? X = ? Y)$ and $bound (? X)$ are atomic selection formulas; (ii) If F and $F^{'}$ are selection formulas then $(F \land F^{'})$ , $(F \lor F^{'})$ and $\neg (F)$ are boolean selection formulas. The evaluation of a selection formula F under μ, denoted $μ (F)$ , is defined in a three-valued logic (i.e. with values $true$ , $false$ , and $error$ ) as follows:

If F is $? X = c$ and $? X \in dom (μ)$ , then $μ (F) = true$ when $μ (? X) = c$ and $μ (F) = false$ otherwise. If $? X \notin dom (μ)$ then $μ (F) = error$ .

If F is $? X = ? Y$ and $? X, ? Y \in dom (μ)$ , then $μ (F) = true$ when $μ (? X) = μ (? Y)$ and $μ (F) = false$ otherwise. If either $? X \notin dom (μ)$ or $? Y \notin dom (μ)$ then $μ (F) = error$ .

If F is $bound (? X)$ and $? X \in dom (μ)$ then $μ (F) = true$ else $μ (F) = false$ .

If F is a complex selection formula then it is evaluated following the three-valued logic presented in Table 1.

Table 1
Three-valued logic for evaluating selection formulas

p q $p \land q$ $p \lor q$

$true$ $true$ $true$ $true$

$true$ $false$ $false$ $true$

$true$ $error$ $error$ $true$

$false$ $true$ $false$ $true$

$false$ $false$ $false$ $false$

$false$ $error$ $false$ $error$

$error$ $true$ $error$ $true$

$error$ $false$ $false$ $error$

$error$ $error$ $error$ $error$

p $\neg p$

$true$ $false$

$false$ $true$

$error$ $error$

p	q	$p \land q$	$p \lor q$
$true$	$true$	$true$	$true$
$true$	$false$	$false$	$true$
$true$	$error$	$error$	$true$
$false$	$true$	$false$	$true$
$false$	$false$	$false$	$false$
$false$	$error$	$false$	$error$
$error$	$true$	$error$	$true$
$error$	$false$	$false$	$error$
$error$	$error$	$error$	$error$

p	$\neg p$
$true$	$false$
$false$	$true$
$error$	$error$

We use the symbol Ω to denote a multiset and $card (μ, Ω)$ to denote the cardinality of the mapping μ in the multiset Ω. Moreover, it applies that $card (μ, Ω) = 0$ when $μ \notin Ω$ . We use $Ω_{0}$ to denote the multiset containing only the mapping $μ_{0}$ , that is $card (μ_{0}, Ω_{0}) > 0$ ( $Ω_{0}$ is called the join identity). The domain of a solution mapping Ω is defined as $dom (Ω) = ⋃_{μ \in Ω} dom (μ)$ . The SPARQL algebra for multisets of mappings is composed of the operations of projection, selection, join, difference, left-join, union and minus. Let $Ω_{1}$ , $Ω_{2}$ be multisets of mappings, W be a set of variables and F be a selection formula.

Definition 10 (Operations over multisets of mappings).

Let $Ω_{1}$ and $Ω_{2}$ be multiset of mappings, then: Projection:

$π_{W} (Ω_{1}) = {μ^{'} ∣ μ \in Ω_{1}, μ^{'} = μ_{| W}}$ , $card (μ^{'}, π_{W} (Ω_{1})) = \sum_{μ \in Ω_{1} s.t. μ^{'} = μ_{| W}} card (μ, Ω_{1})$

Selection:

$σ_{F} (Ω_{1}) = {μ \in Ω_{1} ∣ μ (F) = true}$ where $card (μ, σ_{F} (Ω_{1})) = card (μ, Ω_{1})$

Union:

$Ω_{1} \cup Ω_{2} = {μ ∣ μ \in Ω_{1} \lor μ \in Ω_{2}}$ where $card (μ, Ω_{1} \cup Ω_{2}) = card (μ, Ω_{1}) + card (μ, Ω_{2})$

Join:

$Ω_{1} ⋈ Ω_{2} = {μ = (μ_{1} \cup μ_{2}) ∣ μ_{1} \in Ω_{1}, μ_{2} \in Ω_{2}, μ_{1} \sim μ_{2}}$ , $card (μ, Ω_{1} ⋈ Ω_{2}) = \sum_{μ_{1} \in Ω_{1} and μ_{2} \in Ω_{2} s.t. μ = (μ_{1} \cup μ_{2})} card (μ_{1}, Ω_{1}) \times card (μ_{2}, Ω_{2})$ .

Difference:

$Ω_{1} ∖_{F} Ω_{2} = {μ_{1} \in Ω_{1} ∣ \forall μ_{2} \in Ω_{2}, (μ_{1} ≁ μ_{2}) \lor (μ_{1} \sim μ_{2} \land (μ_{1} \cup μ_{2}) (F) = false)}$ where $card (μ_{1}, Ω_{1} ∖_{F} Ω_{2}) = card (μ_{1}, Ω_{1})$

Minus:

$Ω_{1} - Ω_{2} = {μ_{1} \in Ω_{1} ∣ \forall μ_{2} \in Ω_{2}, μ_{1} ≁ μ_{2} \lor dom (μ_{1}) \cap dom (μ_{2}) = \emptyset}$ where $card (μ_{1}, Ω_{1} - Ω_{2}) = card (μ_{1}, Ω_{1})$ .

Left Join:

where = $card (μ, σ_{F} (Ω_{1} ⋈ Ω_{2})) + card (μ, Ω_{1} ∖_{F} Ω_{2})$ .

2.2. SPARQL patterns

We now introduce SPARQL graph patterns. A graph pattern is defined recursively as follows:

A tuple from $(I \cup L \cup V) \times (I \cup V) \times (I \cup L \cup V)$ is a graph pattern called a triple pattern.4

⁴
We assume that any triple pattern contains at least one variable.

If $P_{1}$ and $P_{2}$ are patterns then $(P_{1} AND P_{2})$ , $(P_{1} UNION P_{2})$ , $(P_{1} OPTIONAL P_{2})$ , $(P_{1} MINUS P_{2})$ and $(P_{1} NOT-EXISTS P_{2})$ are graph patterns.

If $P_{1}$ is a pattern and C is a filter constraint (as defined below) then $(P_{1} FILTER C)$ is a pattern.

A filter constraint is defined recursively as follows: (i) If $? X, ? Y \in V$ and $c \in I \cup L$ then $(? X = c)$ , $(? X = ? Y)$ and $bound (? X)$ are atomic filter constraints; (ii) If $C_{1}$ and $C_{2}$ are filter constraints then $(! C_{1})$ , $(C_{1} | | C_{2})$ and $(C_{1} & & C_{2})$ are complex filter constraints. Given a filter constraint C, we denote by $f (C)$ the selection formula obtained from C. Note that there exists a simple and direct translation from filter constraints to selection formulas and vice-versa.

Given a triple pattern t and a mapping μ such that $var (t) \subseteq dom (μ)$ , we denote by $μ (t)$ the triple obtained by replacing the variables in t according to μ. Overloading the above definition, we denote by $μ (P)$ the graph pattern obtained by the recursive substitution of variables in every triple pattern and filter constraint occurring in the graph pattern P according to μ.

2.3. Semantics of SPARQL graph patterns

The evaluation of a SPARQL graph pattern $P$ over an RDF graph G is defined as a function ${⟦ P ⟧}_{G}$ which returns a multiset of solution mappings. Let $P_{1}$ , $P_{2}$ , $P_{3}$ be graph patterns and C be a filter constraint. The evaluation of a graph pattern $P$ over a graph G is defined recursively as follows:

If $P$ is a triple pattern $t_{p}$ then ${⟦ t_{p} ⟧}_{G} = {μ ∣ dom (μ) = var (t_{p}) and μ (t_{p}) \in G}$ where $var (t_{p})$ is the set of variables in $t_{p}$ and the cardinality of each mapping is 1.

If $P = (P_{1} AND P_{2})$ , then ${{{⟦ P ⟧}_{G} = ⟦ P_{1} ⟧}_{G} ⋈ ⟦ P_{2} ⟧}_{G}$

If $P = (P_{1} UNION P_{2})$ , then ${{{⟦ P ⟧}_{G} = ⟦ P_{1} ⟧}_{G} \cup ⟦ P_{2} ⟧}_{G}$

If $P = (P_{1} OPTIONAL P_{2})$ , then: then

if $P_{2}$ is $(P_{3} FILTER C)$ then ${⟦ P ⟧}_{G}$ =

else

If $P = (P_{1} MINUS P_{2})$ , then ${{{⟦ P ⟧}_{G} = ⟦ P_{1} ⟧}_{G} - ⟦ P_{2} ⟧}_{G}$

If $P = (P_{1} NOT‐EXISTS P_{2})$ , then

${⟦ {(P_{1} NOT‐EXISTS P_{2}) ⟧}_{G} = {μ ∣ μ \in ⟦ P_{1} ⟧}_{G} \land ⟦ μ {(P_{2}) ⟧}_{G} = \emptyset}$

If $P = (P_{1} FILTER C)$ , then ${⟦ P_{1} FILTER C ⟧}_{G} = σ_{f (C)} {(⟦ P_{1} ⟧}_{G})$

2.4. SPARQL property paths

Property paths (PPs) have been incorporated into the SPARQL standard with two main motivations; first, to provide explicit graph navigational capabilities (thus allowing the writing of SPARQL navigational queries in a more succinct way); second, to introduce the transitive closure operator ^∗ previously not available in SPARQL. The design of PPs was influenced by earlier proposals (e.g., PSPARQL [3], nSPARQL [42]).

Fig. 5.

Standard query semantics of SPARQL property paths, where $α, β \in (I \cup L \cup V)$ ; $u, u_{1}, \dots, u_{n} \in I$ ; $x_{L}, x_{R} \in (I \cup L)$ ; $? v_{L}, ? v_{R} \in V$ ; $? v \in V$ is a fresh variable.

Fig. 6.

Auxiliary functions used for defining the semantics of PP expressions of the form ${elt}^{*}$ .

Definition 11 (Property Path Pattern).

A property path pattern (or PP pattern for short) is a tuple with $α \in (I \cup L \cup V)$ , $β \in (I \cup L \cup V)$ , and is a property path expression (PP expression) that is defined by the following grammar (where $u, u_{1}, \dots, u_{n} \in I$ ):

Table 2
Syntax of EPPs

The SPARQL standard introduces additional types of PP expressions [44]; since these are merely syntactic sugar (they are defined in terms of expressions covered by the grammar given above), we ignore them in this paper. As another slight deviation from the standard, we do not permit blank nodes in PP patterns. PP patterns with blank nodes can be simulated using fresh variables. The SPARQL standard distinguishes between two types of property path expressions: connectivity patterns (or recursive PPs) that include closure (), and syntactic short forms or non-recursive PPs (nPPs) that do not include it. As for the evaluation of PPs, the W3C specification informally mentions the fact that nPPs can be evaluated via a translation into equivalent SPARQL basic expressions (see [26], Section 9.3). Property path patterns can be combined with graph patterns inside SPARQL patterns (using PP expressions in the middle position of a pattern).

2.5. Property path semantics

The semantics of Property Paths (PPs) is shown in Fig. 5. The semantics uses the evaluation function , which takes as input a PP pattern and a graph and returns a multiset of solution mappings. In Fig. 5 we do not not report all the combinations of types of patterns as they can be derived in a similar way. For connectivity patterns the SPARQL standard introduces an auxiliary function called ALP that stands for Arbitrary Length Paths (see Fig. 6); in this case the evaluation does not admit duplicates (thus solving a problem in an early version of the semantics that was based on counting [7,38]).

3. Extended property paths

We now introduce our navigational extension of SPARQL called Extended Property Paths (EPPs). We present the syntax in Section 3.1 and the SPARQL-based formal semantics in Section 3.2.

3.1. Extended property paths syntax

EPPs extend PPs and NRE-like languages with path conjunction/difference, repetitions and more types of tests. The importance of the new features considered by EPPs is witnessed by the fact that some of them (e.g., conjunction) are present in standards like XPath 2.0 [11]. Nevertheless, to the best of our knowledge no previous navigational extension of SPARQL has considered these features. As our goal is to extend the current SPARQL standard we refer the reader to Section 7 for a treatment of EPPs as a language independent from SPARQL.

Definition 12 (Extended property path pattern).

An extended property path pattern (or EPP pattern for short) is a tuple with $α \in (I \cup L \cup V)$ , $β \in (I \cup L \cup V)$ , and an extended property path expression (EPP expression) that is defined by the grammar reported in Table 2.

EPPs introduce the following features: path conjunction (), path difference (), path repetitions between l and h times (denoted by for set, and for bag semantics). EPPs allow different types of tests () within a path by specifying the starting/ending positions () of a test; it is possible to test from each of the subject, predicate and object positions in triples, mapped in the EPPs syntax to the position symbols , and , respectively. Positions do not need to be always specified; by default a test starts from the subject () and ends on the object () of the triple being evaluated. A test () can be a simple check for the existence of an IRI in forward/reverse direction. EPPs allow to express negated property sets by using the production with the difference that the set of negated IRIs use the symbol as separator instead of used by PPs. A test can also be a nested EPP, i.e., , which corresponds to the evaluation of the expression starting from a position (of the last triple evaluated) and returns true if, and only if, there exists at least one node that can be reached via . In a test of type , EExp (not reported here for sake of space) extends the production [110] in the SPARQL grammar 5

⁵
http://www.w3.org/TR/sparql11-query/#rExpression

where BuiltInCall6

⁶

http://www.w3.org/TR/sparql11-query/#rBuiltInCall

is substituted with a new production called Extended-BuiltInCall, which enables to use in EPPs tests available in SPARQL as built-in conditions also augmented with positions (

). Built-in conditions are constructed using elements of the set

I \cup L

and constants, logical connectives (¬, ∧, ∨), (in)equality symbol(s) (=, <, >, ⩽, ⩾), unary (e.g., isURI), and binary (e.g., STRSTARTS) functions. Tests can also be combined by using the logical operators and (

), or (

) and not (

). We refer to non-recursive EPPs (nEPPs) as those expressions that do not include closure operators (i.e., ^∗ and ⁺) and set-semantics repetitions (

{l, h}

). The reader can refer to the Website of the EPPs project7

⁷

http://extendedpps.wordpress.com

for further details about the implementation.

3.1.1. Positions and tests

To clarify the intuition behind tests and positions, we introduce the function , which projects the element in position of a triple t. If we have $t = ⟨ u_{1}, p_{1}, u_{2} ⟩$ , the test is translated to that checks $p_{1} = p_{1}$ , and, in this case, returns true; however, it returns false for . Figure 8 shows the expression from Example 4 including default positions and positions to traverse backward edges. Note that the subexpression means that the edge :leaderParty is traversed from the object to the subject and, thus, backward.

Table 3
EPPs SPARQL-based semantics. The function $E_{T}$ handles tests. projects the element in position of a triple $t \in G$ . Moreover, $u \in I$ ; $? v_{L}, ? v_{R} \in V$ and $? v_{n} \in V$ is a fresh variable. evaluate is a function that checks if the triple t satisfies EExp

3.2. Extended property paths semantics

We now introduce the semantics of EPPs in terms of SPARQL. We use the function where instead of a PP expression now appears an EPP expression . This semantics lays the foundations for the translation algorithm (see Section 4) that given a (concise) nEPP expression produces a semantically equivalent (more verbose) SPARQL query. In the semantics shown in Table 3 we only report the case $α, β \in V$ (and use the symbols $? v_{L}$ and $? v_{R}$ to denote the left and right variable in the pattern); the other cases (e.g., $α \in I$ , $β \in V$ ) are similar. We denote with t a triple $⟨ s, p, o ⟩ \in G$ ; $t . x$ with $x \in {s, p, o}$ is used to access an element of the triple. Finally, the notation is a shorthand for the concatenation (i.e., via the operator ‘’) of i times. A peculiar construct of EPPs is the test , which is handled at a high level by rule R10. In particular tests make usage of the semantic function $E_{T}$ , which handles the different kinds of tests via rules R11–R16. Moreover, and denote the positions (i.e., subject , predicate or object ) of the elements of a triple that have to be projected. We now provide some examples of R11–R13 by using the graph in Fig. 1.

Example 13.
Consider the following EPP expression: . This type of test is handled via rule R11 in Table 3 and considers all triples $t \in G$ where :leaderParty appears in the predicate position. In the set of mappins obtained by applying rule R11 on such triples, the left variable (i.e, $? v_{L}$ ) is bound to the object (since ) while the right variable (i.e, $? v_{R}$ ) is bound to the subject (since ). In particular, the set of mappings is: ${(? v_{L} \to :Democratic_Party, ? v_{R} \to :Rome), (? v_{L} \to :Democratic_Party, ? v_{R} \to :Florence), (? v_{L} \to :Socialist_Party$ , $? v_{R} \to :Carrara)}$ .
Example 14.
Consider the following EPP expression: , which is handled via rule R12 in Table 3. In this case, the triples $t \in G$ considered are those such that from their object, the EPP :leaderParty has a solution (). In more detail, these triples have one among :Rome, :Florence or :Carrara in the object position (in particular, the two triples $⟨ :Rome, :airbus, :Florence ⟩$ and $⟨ :Florence, :italo, :Carrara ⟩$ ). To obtain the set of mappings from these triples, the left variable in rule R12 (i.e., $? v_{L}$ ) will be bound to their subject (since ) and the right variable (i.e., $? v_{R}$ ) to their object (since ). Overall, the set of mappings is: ${(? v_{L} \to :Rome, ? v_{R} \to :Florence), (? v_{L} \to :Florence, ? v_{R} \to :Carrara)}$ .

Fig. 7.
Auxiliary functions used to define the semantics of EPP expressions.
Example 15.
Consider the following EPP expression: handled via rule R13 in Table 3. The set of triples $t \in G$ that are of interest in this case are those in which the object has a value greater than 400000 ( $Evaluate (EExp, t) = true$ ). These are: $⟨ :Rome, ::population, 2874034 ⟩$ , $⟨ :Murcia, :population, 436870 ⟩$ and $⟨ :Miami, :population, 419777 ⟩$ . In the set of mappings obtained applying rule R13 on these triples, the left variable (i.e, $? v_{L}$ ) is bound to the subject (since ) and the right variable (i.e, $? v_{R}$ ) to the predicate (since ). The set of mappings is: ${(? v_{L} \to :Rome, ? v_{R} \to :population), (? v_{L} \to :Murcia, ? v_{R} \to :population), (? v_{L} \to :Miami, ? v_{R} \to :population)}$ .

Closure and Repetitions. The closure operators and and set-semantics repetitions ( ${l, h}$ ) use the function EALP (Extended Arbitrary Length Paths) shown in Fig. 7, which extends the ALP function defined in the W3C specification (see Fig. 6). In particular, EALP handles the set-semantic repetitions of an EPP expression between a minimum l and a maximum h of times. The closure operators and are handled by setting $l = 0$ (respectively, $l = 1$ ) and $h = $ . EALP uses the global variable $Visited$ to keep track of the nodes already checked that belong to the results. The main task carried out by EALP is to skip the first $l - 1$ navigational steps so that the results are stored in $Visited$ starting from the step l via EALP. We now further clarify the behavior of EALP and EALP. Example 16.
Consider the expression :Carrara (:twinned)^∗?e evaluated according to $EALP$ on the graph in Fig. 1. As the expression involves the closure operator, $EALP$ is called with the following parameters: $EALP (:Carrara, :twinned, G, 0,^{})$ . $EALP$ initializes the global variable Visited to the empty set and the variable Γ to the set ${:Carrara}$ (lines 1 and 3). The while cycle is never executed as $l = 0$ . Since $h =^{}$ the function EALP is called as: $EALP ({:Carrara}, :twinned, \emptyset, G,^{})$ . At this point, when the for cycle starts we have that $Γ = {:Carrara}$ and $Visited = \emptyset$ (line 1). Then, :Carrara is added to $Visited$ (line 2) and the set $\bar{Γ}$ is computed, which includes all nodes reachable from :Carrara by traversing a :twinned edge (line 4), that is, $\bar{Γ} = {:Grasse}$ ; EALP is called again with the parameters: $EALP ({:Grasse}, :twinned, {:Carrara}, G,^{})$ (line 6); Γ contains one IRI (i.e., :Grasse) and the for* cycle is executed only once: :Grasse is added to $Visited$ (line 2) and $\bar{Γ} = {:Migliarino, :Murcia}$ (line 4). EALP is called again with the parameters: $EALP ({:Migliarino, :Murcia}, :twinned, {:Carrara, :Grasse}, G,^{})$ (line 6). This time Γ contains two IRIs (i.e., :Migliarino and :Murcia) and the for cycle is executed twice one for each such IRIs. With :Migliarino we have that $\bar{Γ} = \emptyset$ and EALP is not called anymore.

With :Murcia we have that $\bar{Γ} = {:Miami}$ and EALP is called as: $EALP ({:Miami}, :twinned, {:Carrara, :Grasse, :Migliarino, :Murcia}, G,^{})$ . Since Γ contains one IRI only (i.e., :Miami) the for cycle is executed only once: :Miami is added to $Visited$ , $\bar{Γ} = \emptyset$ and $EALP$ is not called anymore. Since $Visited$ is a global variable, the result of the execution is: ${:Carrara, :Grasse, :Migliarino, :Murcia, :Miami}$ .
Example 17.
Consider the EPP expression :Carrara (:twinned) {1,2} ?e evaluated on the graph in Fig. 1. This time $EALP$ is called with the parameters: $EALP (:Carrara, :twinned, G, 1, 2)$ . $EALP$ initializes the global variable Visited to the empty set and the variable Γ to the set ${:Carrara}$ (lines 1 and 3). The while cycle is executed for one iteration only since $l = 1$ . The set $\bar{Γ}$ is computed starting from :Carrara; in this case it is $\bar{Γ} = {:Grasse}$ . EALP will be called on this set. In particular, since $h = 2$ the function EALP is called with the following parameters: $EALP ({:Grasse}, :twinned, \emptyset, G, 1)$ . The for cycle is executed only once, since $Γ = {:Grasse}$ and $Visited = \emptyset$ (line 1). After the execution $\bar{Γ} = {:Migliarino, :Murcia}$ and EALP is called again as: $EALP ({:Migliarino, :Murcia}, :twinned, {:Grasse}, G, 0)$ . As $h = 0$ :Migliarino and :Murcia are added to Visited; however, the for cycle will not be executed. The result is {:Grasse, :Migliarino, :Murcia}.

Fig. 8.
Expression in Example 4 with positions.

Table 4
Fragments of SPARQL, using the SELECT query form, considered in this paper

Fragment ⋈ ( $AND$ ) ∪ ( $UNION$ ) − ( $MINUS$ ) FILTER PP EPP ALP EALP

$S^{{⋈}}$ x

$S^{{⋈, \cup, FILTER}}$ x x x

$S^{{⋈, \cup, -, FILTER}}$ x x x x

$S^{{⋈, \cup, FILTER, ALP}}$ x x x x

$S^{{⋈, \cup, -, FILTER, EALP}}$ x x x x x

$S^{{⋈, \cup, PP}}$ x x x

$S^{{⋈, \cup, EPP}}$ x x x

$S^{{⋈, \cup, FILTER, PP, ALP}}$ x x x x x

$S^{{⋈, \cup, FILTER, EPP, EALP}}$ x x x x x

Usage of EPPs in Practice. The overall goal of our proposal is to use EPP expressions in the predicate position of a property pattern (Definition 11 ) in lieu of PP expressions. This requires to “update” the SPARQL parser to support the nEPPs syntax. The aim of the Jena extension we implemented8
⁸
http://extendedpps.wordpress.com

was to integrate nEPPs into an already existing (and popular) library. Clearly, while nEPPs expressions can be evaluated on current SPARQL processors, the evaluation of full EPPs expressions requires to also “update” query processors by replacing the ALP procedure with EALP.
3.3. Fragments of SPARQL considered

Fragment	⋈ ( $AND$ )	∪ ( $UNION$ )	− ( $MINUS$ )	FILTER	PP	EPP	ALP	EALP
$S^{{⋈}}$	x
$S^{{⋈, \cup, FILTER}}$	x	x		x
$S^{{⋈, \cup, -, FILTER}}$	x	x	x	x
$S^{{⋈, \cup, FILTER, ALP}}$	x	x		x			x
$S^{{⋈, \cup, -, FILTER, EALP}}$	x	x	x	x				x
$S^{{⋈, \cup, PP}}$	x	x			x
$S^{{⋈, \cup, EPP}}$	x	x				x
$S^{{⋈, \cup, FILTER, PP, ALP}}$	x	x		x	x		x
$S^{{⋈, \cup, FILTER, EPP, EALP}}$	x	x		x		x		x

In the remainder of the paper we will focus on the SELECT query form and consider the SPARQL fragments shown in Table 4. These fragments are built using combinations of: (i) the operators ⋈ ( $AND$ ), ∪ ( $UNION$ ), − ( $MINUS$ ), FILTER; (ii) the functions $ALP$ and $EALP$ (introduced in Section 3.2); (iii) $PP$ and $EPP$ languages.

4. Translation of nEPPs into SPARQL

The goal of this section is to formalize and describe a translation algorithm that given a non-recursive EPPs (nEPP) translates it into a SPARQL query. Our approach follows the same line of thought as the SPARQL standard for the translation of non-recursive property paths (nPPs) into SPARQL queries. As a by-product, our study formalizes the informal procedure mentioned in the W3C specification for non-recursive PPs (see [26], Section 9.3) and does it for a more expressive language.

4.1. Translation algorithm: An overview

We now provide an overview of the translation algorithm $A^{t}$ . The algorithm takes as input a nEPP pattern and produces a semantically equivalent SPARQL query $Q_{e}$ . The algorithm involves three main steps: (i)building of the operational tree; (ii)propagation of variables and terms along the nodes of the operational tree; (iii)application of the translation rules. Each of the three steps is discussed in detail in the following three subsections.

4.1.1. Operational tree

Let be a nEPP pattern and $τ_{P}$ be the parse tree associated to the expression . Let 9

⁹
Note that the and syntactic operators are omitted since they are only syntactic sugar and can be rewritten by using and .

be the set of node types,

Ω = {b, e, m, s, p, o}

and

Δ = {{pos}_{1}, {pos}_{2}, pos}

be two sets of attributes. The operational tree

π_{P} = (V, E, type, i d, ω, δ)

associated to the pattern

P

is a binary, ordered, labeled, rooted tree, where V is the set of nodes,

E \subset V \times V

the set of edges,

type : V \to T

is a function that associates to each node a type,

i d

a function that associates to each node a unique identifier,

ω : V \times Ω \to U \cup L \cup V

a function that associates to a pair

(v, a)

, such that

v \in V

and

a \in Ω

a URI, a literal or a variable identifier. Finally,

is a function that associates to a pair

(v, a)

, such that

v \in V

and

a \in Δ

, a position symbol. The nodes of the operational tree can be subdivided in two categories: operational nodes that are labeled with the syntactic symbols

, and test nodes that are labeled with

. Figure 9 reports, for each type of node, its set of attributes (i.e., the domain of the functions ω and δ). The attributes b (start) and e (end) denote the starting and ending points of the operation represented by each operational node. Concatenation nodes (/) have the additional attribute m that maintains the join variable.

Fig. 9.

Node attributes in the operational tree.

Test nodes have attributes s, p, o denoting the subject, predicate and object of the triple on which the test is to be checked. Additionally, since the test node encodes a triple traversal it has also the attributes start () and end () that can be valued with , or , denoting the position of beginning and ending of the traversal. Finally, test nodes have the additional attribute (also valued with one among , or ) that indicates the beginning of the existential test with respect to the last triple.

The root r of $π_{P}$ is a special node of type root having $i d (r) = 0$ and attributes b (start) and e (end) valued with the pattern endpoints, that is, $ω (r, b) = α$ and $ω (r, e) = β$ . To build the operational tree, the nodes of the parse tree $τ_{P}$ are visited according to a pre-order traversal, that is, the parent first, then left child and finally the right child, if one exists. In what follows, the function $parent$ indicates the parent of a node. Moreover, the function $corr$ applied to each node of $τ_{P}$ returns exactly one node of $π_{P}$ . For each node v of $τ_{P}$ visited, we have:

If v is the root of $τ_{P}$ , then a node c is added as the only child of r with $i d (c) = 0_0$ . If v is a left child of some node of $τ_{P}$ , a node c is added as the left child of $corr (parent (v))$ and $i d (c) = i d (parent (c)) + “_0 ”$ . If v is a right child of some node of $τ_{P}$ , then a node c is added as the right child of $corr (parent (v))$ with $i d (c) = i d (parent (c)) + “_1 ”$ . Furthermore:

If v is an operational node, then c has the same type as v and all its attributes are initialized with fresh variables. Moreover, $corr (v) = c$ .

If v is a test node and $corr (parent (v)) = c^{″}$ is an operational node, then c has type test, its attributes s,p,o are initialized with fresh variables and ${pos}_{1}$ and ${pos}_{2}$ are set to be equal to the position used in the test (or to the default positions if they are omitted). Moreover, a node $c^{'}$ is added as the only child of c with the same type of v and $i d (c^{'}) = i d (c) + “_0 ”$ . Moreover, its attributes s, p and o are initialized with fresh variables. If $type (c^{'}) = TP$ then the attribute pos is initialized with the value specified in the existential test. Note that $corr (v) = c^{'}$ .

If v is a test node, and $c^{'} = corr (parent (v))$ is a test node, then c has the same type as v and all its attributes are initialized with fresh variables. We have that $corr (v) = c$ .

The operational tree for the nEPP pattern of Example 2 is shown in Fig. 11(a). Fresh variables for the attributes of a node n are generated using the template: ?X+_+ $i d$ (n), where X $\in {b, e, m, s, p, o}$ , with + denoting string concatenation.

Fig. 10.

Propagation of variables and terms.

4.1.2. Propagation of variables and terms

Given an operational tree for a pattern $P$ , each of its nodes has attributes valued with variables or terms. The translation algorithm takes care of propagating these variables and terms during the generation of the SPARQL query associated to $P$ via the Procedure Propagate (Fig. 10), which takes as input a node (the root at the beginning) and propagates values to its children. As an example, Fig. 11(b) shows the operational tree after the propagation on the tree in Fig. 11(a). An example, by looking at R2 in the EPPs semantics shown in Table 3, we notice that path concatenation () makes usage of the join operator; specifically, it requires to introduce a fresh join variable in the translation. The propagation algorithm guarantees that both children of the concatenation node use the same join variable by applying the propagation rules reported in Fig. 10 (lines 25–30). By looking at Fig. 11, such rules translates to the fact that the attribute b of node $0_0_0$ of Fig. 11(b) is propagated to the attribute s of node $0_0_0_0$ ; the attribute e of node $0_0_0$ is propagated to the attribute e of node $0_0_0_1$ ; and the value of the attribute m (that is a fresh variable) is propagated to the attribute o of node $0_0_0_0$ and to the attribute s of the node $0_0_0_1$ .

Furthermore, the propagation phase also ensures that the tests are executed on the correct position of the triple and that the endpoints are correctly selected by applying the rules reported in Fig. 10 lines 8–19. By looking at Fig. 11, the rule in lines 16–17 translates to the fact that the attribute s of node $0_0_0_0$ of Fig. 11(b) is propagated to the attribute o of node $0_0_0_0_0$ ; the attribute p of node $0_0_0_0$ is propagated to the attribute p of node $0_0_0_0_0$ ; and the value of the attribute o is propagated to the attribute s of node $0_0_0_0_0$ .

4.1.3. Generating SPARQL code

The last step of the translation algorithm takes as input the result of the previous phases, that is, an operational tree where all attribute values are filled with the correct values (i.e., RDF terms, fresh variables and the variables or terms α and β derived from the nEPP pattern as input). At this point, to generate the SPARQL code for a given nEPP pattern, the translation algorithm leverages the translation rules shown in Table 5. The translation uses two functions: $Θ^{P} (\cdot)$ that handles general nEPP expressions and $Θ^{t} (\cdot)$ that handles tests.

The translation algorithm applies the rules starting from the root and proceeding via a pre-order depth-first traversal of the operational tree. In a nutshell, the translation proceeds as follows: rule $R_{m}$ generates the outermost part of the final SPARQL query; moreover, it projects the variables stored in the attributes root.b and root.e; for sake of presentation we assume that $α, β \in V$ in the input pattern . Path concatenation is handled via rule R2 and is semantically dealt with via the join operator (R2 in Table 3). Each of the two operands of the join is one of the children of the node labeled with / in the operational tree. The join operator is also used to handle path conjunction (R4). The difference with path concatenation resides in the usage of the variables; indeed, by looking at Table 3 we note that concatenation makes usage of a (fresh) join variable stored in the attribute m of the concatenation node of the operational tree, while path conjunction is evaluated from the same endpoints for both conjuncts. In the same spirit, we note that path difference (R5) is translated by using the − operator in the SPARQL algebra (see Table 3) that syntactically corresponds to the MINUS operator. Path union (R3), which uses the union operator from the SPARQL algebra, is translated using its SPARQL syntactic counterpart, that is, UNION. Reverse path (R1) is handled by switching, in the propagation phase, n’s variables when propagated to its child node $n . child (1)$ . Tests are handled by a combination of FILTER and FILTER EXISTS along with UNION to deal with disjunction of tests, join to deal with conjunction of tests and MINUS to deal with negated tests. To give a hint, a nEPP pattern containing a single triple pattern of the form $⟨ ?b, u, ?e ⟩$ where $u \in I$ is translated via rule R7 as SELECT ?b ?e WHERE {?b ?p $_0_0$ ?e.FILTER(?p $_0_0$ =u)} where ?p $_0_0$ is a variable automatically generated. A nEPP pattern containing an EBC (Extended-BuiltInCall) is translated via rule R8 by using a FILTER expression applied to the specified EBC. For example, the nEPP pattern is translated as SELECT ?b ?e WHERE {?b ?p $_0_0$ ?e. FILTER(isLiteral(?e))} where the parameter of the isLiteral BuiltInCall is substituted during the translation with the variable ?e. Nested nEPPs are handled via rule R9 and are basically existential tests; test whether the nested nEPP has a solution (see also rule R12 in Table 3).

Fig. 11.

Operational tree for Example 2 before (a) and after (b) the propagation phase.

Table 5

Translating nEPPs into SPARQL (code). EBC extends SPARQL BuiltInCall with EPPs tests also augmented with positions (). nEPPs with double-brace path repetitions () are first translated into equivalent nEPPs via unions of concatenations

Example 18 (Translating nEPPs into SPARQL).

Consider the nEPP pattern in Example 2. The corresponding operational tree is reported in Fig. 11(a). The operational tree obtained after the application of the procedure Propagate is shown in Fig. 11(b). As an example, by looking at the operational node with id=0_0_0 and labeled with in Fig. 11(a) and (b) we can see that Propagate updated the values of the attributes s and o of its children 0_0_0_0 and 0_0_0_1 with values in the attributes b and e of 0_0_0. Applying the translation rules to the operational tree in Fig. 11(b) means starting from root (node 0) and triggering rule $R_{m}$ (see Table 5), which generates the outermost part of the final SPARQL translation: . Then, the node with id=0_0 and labeled with is visited; this triggers R5: . The translation uses MINUS to reflect the semantics of EPPs dealing with path difference while (e.g., 0_0_0_0) is reflected via the FILTER operator. Visiting the node 0_0_0 triggers R2. The translation continues with:

The translation continues until no more nodes of the operational tree have to be visited and gives:

SELECT ?x ?y WHERE { {?x ?p_0_0_0_0 ?m_0_0_0. ?y ?p_0_0_0_1 ?m_0_0_0. FILTER(?p_0_0_0_0=:country) FILTER(?p_0_0_0_1=:country)} MINUS {?x ?p_0_0_1_0 ?m_0_0_1. ?y ?p_0_0_1_1 ?m_0_0_1. FILTER(?p_0_0_1_0=:region) FILTER(?p_0_0_1_1=:region)} }

Table 6
Languages and translations into SPARQL for plain RDF

Navigational core Extended processor Fragment SPARQL fragment

$p \in I$ No R1 in Fig. 5 $S^{{⋈}}$

nPP No R1–R5 in Fig. 5 $S^{{⋈, \cup, FILTER}}$

nEPP No R1–R2, R5–R9, R11–R16 in Table 3 $S^{{⋈, \cup, -, FILTER}}$

PP No Fig. 5 $S^{{⋈, \cup, FILTER, ALP}}$

EPP Yes Table 3 $S^{{⋈, \cup, -, FILTER, EALP}}$

Navigational core	Extended processor	Fragment	SPARQL fragment
$p \in I$	No	R1 in Fig. 5	$S^{{⋈}}$
nPP	No	R1–R5 in Fig. 5	$S^{{⋈, \cup, FILTER}}$
nEPP	No	R1–R2, R5–R9, R11–R16 in Table 3	$S^{{⋈, \cup, -, FILTER}}$
PP	No	Fig. 5	$S^{{⋈, \cup, FILTER, ALP}}$
EPP	Yes	Table 3	$S^{{⋈, \cup, -, FILTER, EALP}}$

Discussion about the translation

Conciseness. EPPs enable to write navigational queries in a more succinct way as compared to SPARQL queries using triple patterns and/or union of graph patterns. Given a nEPPs expression containing a number of fragments (e.g., concatenation, union, predicates) it is interesting to note that its corresponding translation in SPARQL is always more verbose. Consider for instance the nEPPs pattern ?x (p1&p2)/p3 ?y; here, conjunction avoids to use two triple patterns and concatenation avoids to explicitly deal with an intermediate variable besides the expression endpoints. Its translation in SPARQL, that is, ?x p1 ?a. ?x p2 ?a. ?a p3 ?y includes three triple patterns and 2 instances of the new variable ?a. Generally, the number of variables necessary to translate a nEPPs into an equivalent SPARQL query is a function of the size of its operational tree. Not only the elimination of intermediate variables increases the succinctness of the expression, but it also eliminates causes of errors when writing queries as one has to check the consistency of variable names.

Benefits. EPPs coupled with the translation procedure bring a significant practical advantage as compared to other navigational extensions of SPARQL (e.g., nSPARQL, cpSPARQL). On one hand, nEPPs can be evaluated over existing SPARQL processors. On the other hand, the machinery presented in this paper could potentially extend the SPARQL standard in an elegant and non-intrusive way; one would need to use our translation algorithm instead of that currently used by the SPARQL standard.

4.2. SPARQL and navigational queries

By analyzing the translation algorithm presented in the previous section and the translation rules reported in Table 5, it is possible to identify the precise SPARQL fragment that can express nEPPs.

Lemma 19.
nEPPs can be expressed in the SPARQL fragment $S^{{⋈, \cup, -, FILTER}}$ , which uses AND, UNION, MINUS, FILTER and SELECT.

In the remainder of this section we analyze for different navigational cores, the SPARQL fragment necessary for its rewriting. The results of the analysis are reported in Table 6. The table shows in the first column (Navigational Core) the navigational core, that is, the type of expression allowed in the predicate position of triple patterns; it can be a predicate p, a non-recursive property path (nPP), a property path (PP), a non-recursive EPP (nEPP), and an EPP. The second column (Extended Processor) indicates whether the evaluation requires changes to SPARQL processors. The third and fourth column represents the SPARQL fragment needed for the rewriting. The simplest case (row 1) does not use regular-expression-like extensions and thus no rewriting is needed. The second and fourth rows consider non-recursive and recursive property paths, respectively. These cases are handled, as per W3C specification, via a rewriting into SPARQL and the ALP procedure, respectively. The third and last rows concern nEPPs and full EPPs, respectively. While the former can be translated into SPARQL queries (as shown in the previous section) and evaluated on existing processors, the latter requires the usage of the EALP procedure shown in Fig. 7, currently not available in existing processors.

The most interesting result that emerges from the table is that the fragment $S^{{⋈, \cup, FILTER}}$ of the current SPARQL standard is already expressive enough to capture nEPPs. Hence, the current W3C standard could readily benefit from the more expressive language of nEPPs without any impact on current SPARQL processors. In the following proposition we also mention an even stronger result that can be derived if we drop set-semantics path repetitions in EPPs (R9’ in Table 3).
Proposition 20.
The full EPPs language can be incorporated in SPARQL using the ALP procedure with the only difference that for the evaluation of the patterns (see Fig. 6 ) the translation discussed in Section 4 has to be used instead of the translation currently used by the standard.

We observe that the complexity of evaluating queries in most of the fragments reported in Table 6 has been studied in the literature. The fragments not including ALP have been extensively studied by Perez et. [42] (under set semantics) and Schmidt et al. [49] (under bag semantics). The fragments including ALP have been studied by Arenas et al. [7] and Loseman et al. [38]. This work introduces and studies the fragment incorporating EALP.
5. Query-based reasoning on existing SPARQL processors

The aim of this section is to study the support that EPPs can give to querying under entailment regimes (Section 5.1) with particular emphasis on how to support the entailment regime on existing SPARQL processors (Section 5.2).

5.1. Capturing the entailment regime

In this paper we focus our attention on the $ρ df$ fragment [22,41]. This fragment considers a subset of RDFS vocabulary consisting of the following elements: rdfs:domain, rdfs:range, rdfs:type, rdfs:subClassOf, rdfs:subPropertyOf that we denote with dom, range, type, sc, and sp, respectively. The authors [41] showed that the $ρ df$ semantics is equivalent to that of full RDFS when one focuses on this fragment. Note that $ρ df$ does not consider datatypes that would allow to obtain inconsistent graphs. When considering SPARQL under the ρdf entailment regime, not only the explicit triples in the RDF graph G have to be taken into account but also triples that can be derived from G by the inference rules shown in Table 7. The application of each inference rule enables to obtain a sequence of graphs $G_{1}, G_{2}, G_{3}, \dots, G_{k}$ with $G_{i + 1} ∖ G_{i} \neq \emptyset \forall i \in [1, \dots, k - 1]$ . When $G_{k + 1} ∖ G_{k} = \emptyset$ , that is, when the graph is unchanged, the application of the rules stops. The graph $G_{k}$ is called the closure of G indicated by $c l (G)$ .

Table 7
The $ρ df$ rule system. Capital letters $A$ , $B$ , $C$ , $X$ , and $Y$ , stand for meta-variables to be replaced by actual terms in $I L$

1. Subclass: (a) $\frac{(A, sc, B) (X, type, A)}{(X, type, B)}$ ; (b) $\frac{(A, sc, B) (B, sc, C)}{(A, sc, C)}$

2. Subproperty: (a) $\frac{(A, sp, B) (X, A, Y)}{(X, B, Y)}$ ; (b) $\frac{(A, sp, B) (B, sp, C)}{(A, sp, C)}$

3. Domain: $\frac{(A, dom, B) (X, A, Y)}{(X, type, B)}$

4. Range: $\frac{(A, range, B) (X, A, Y)}{(Y, type, B)}$

Definition 21 (SPARQL and query-based reasoning).

Given a SPARQL pattern $P$ and an RDF graph G, the evaluation of $P$ over G under the $ρ df$ semantics is denoted by $⟦ P ⟧_{G}^{ρ df}$ , while ${⟦ P ⟧}_{cl (G)}$ denotes the evaluation of $P$ over the closure of G.

The intended meaning of two semantics differs with respect to the data graph on which the evaluation is performed. In particular, $⟦ P ⟧_{G}^{ρ df}$ means that $P$ is evaluated on the original data graph G, and the results provided should include those generated by considering the $ρ df$ rules. On the other hand, ${⟦ P ⟧}_{cl (G)}$ means that $P$ is evaluated on the materialization of the closure of G obtained by applying the $ρ df$ rules. Of course, we expect ${⟦ P ⟧_{G}^{ρ df} = ⟦ P ⟧}_{cl (G)}$ to hold.

Most of existing SPARQL processors handle (variants of) ρdf reasoning in the following way: first, compute and materialize the finite polynomial closure of the graph G and then perform query answering on the closure via RDF simple entailment regime [25]. It is interesting to point out that materializing all data by computing the closure $c l (G)$ may cause a waste of space in case most of $c l (G)$ is never really used for query answering, apart from the cost of computation and maintenance after updates. Having a mechanism to support entailment regimes while avoiding the computation of $c l (G)$ beforehand can bring a major advantage. Our goal is to study query-based reasoning, that is, the possibility to rewrite a query into another query that captures $ρ df$ inferences.

Table 8
Encoding of ρdf inference rules via EPPs

Fig. 12.

RDFS inference rules. R5 in Table 8 captures rules (a)–(c) while R6 in Table 8 captures rule (d).

Similarly to nSPARQL [42], cpSPARQL [3] and others approaches (e.g., [12,32]), we identified for each inference rule in the fragment considered, ρdf in our case), an EPP expression encoding it. The translation rules are shown in Table 8. Whenever one wants to adopt the ρdf entailment regime, it is enough to rewrite the input pattern according to these translation rules. The result of the evaluation of the rewritten pattern on G is the same as the result that would be obtained by first computing the closure $c l (G)$ and then evaluating the pattern before the rewriting.

Lemma 22 (ρdf and SPARQL).

Given a triple pattern $(α, p, β)$ with $α, β \in I \cup V$ and $p \in I$ , then for every graph G we have that $⟦ (α, p, β) ⟧_{G}^{ρ df} = ⟦ ((α, Φ {(p), β)) ⟧}_{G} = ⟦ {(α, p, β) ⟧}_{cl (G)}$ .

Sketch.
The proof follows from the fact that rules in Table 8 encode the reasoning rules shown in Table 7. This is immediate to see for rules R1–R4. R5 is composed by the union of three expressions, each capturing one of the three possible ways (shown in Fig. 12(a)–(c)) to derive a type in RDFS and corresponding to rules Subclass (a), Domain and Range in Table 7. The first sub-expression in R5 captures the rule in Fig. 12 (a); a new type can be derived by finding triples of the form $(z, type, x)$ and possibly (via ) traveling up (via sc) the super-classes of x, which is the type of z. The second sub-expression captures the rule depicted in Fig. 12(b); a new type can be derived by navigating from the subject x to the predicate p and all its possible super-properties (via ∗) and then by finding the dom (i.e., a class) of such predicates, and all possible super-classes (via ∗). A similar reasoning applies for the third sub-expression in R5, which captures the inference rule shown in Fig. 12(c). As for rule R6 in Table 8, it captures the rule in Fig. 12(d) corresponding to the rule Subproperty (a) in Table 7. We can notice that the EPP encoding this inference rule includes the union (via ) of two tests. The second test just checks for triples where p is the predicate; the first performs an existential test (i.e., it uses the nested EPP construct ) composed by a conjunction (via ) of two tests, the first moves to the predicate position of a triple and travels up the property hierarchy (via ) while the second checks that the object reached is p. □

We observe that our translation rules are indeed a translation into the language of EPPs of the NRE expressions that have been shown to capture all the RDFS inferences in Perez et al. [42] (Lemma 5.2). Lemma 22 shows that for an arbitrary pattern there exists a rewriting allowing to capture $ρ df$ inferences. Moreover, it is easy to see that the rewriting can be constructed by using the translation rules in Table 8 in linear time in the size of the pattern. However, in this case (and similarly to nSPARQL and PSPARQL) one would need to use an EPPs processor to capture the inferred triples. This clearly hinders the usage of this machinery in existing processors. Therefore, the research question that we face now is how to support query-based reasoning on existing processors.
5.2. Query-based reasoning on existing processors

The idea behind our approach, follows from the observation that closure operators appearing in Table 8 only involve single predicates i.e., , , , . Such types of expressions are property paths that (taken alone) can be evaluated via the ALP procedure defined in the W3C standard, and implemented in existing processors. Therefore, we need to rewrite the EPPs in Table 8 into SPARQL queries where recursive property paths with single predicates are used. We apply a small variation to the translation algorithm presented in Section 4; the variation consists in leaving untouched (single) predicates involving the closure operator () used in Table 8. We refer to this variant of the translation algorithm as $A_{ρ}^{t} (\cdot)$ .

Lemma 23.
Given a triple pattern $P = ⟨ α, p, β ⟩$ with $α, β \in I \cup V$ and $p \in I$ , ${⟦ ⟨ α, p, β ⟩ ⟧}_{cl (G)} = ⟦ A_{ρ}^{t} ⟨ α, Φ {(p), β ⟩ ⟧}_{G}$ .
Proof.
The result follows from Lemma 22 which shows that the EPPs rewriting of the $ρ df$ inference rules (via $Φ (p)$ ) allows to infer the triples in the fragment, and the nEPPs to SPARQL translation (needed in the $A_{ρ}^{t} (\cdot)$ part). □

The above result tells us that an algorithm to perform query-based reasoning works in three steps: (i) apply the translation function $Φ (\cdot)$ ; (ii) apply the translation $A_{ρ}^{t} (\cdot)$ over the result of step (i); (iii) evaluate the SPARQL query resulting from (ii) on existing SPARQL processors.
6. Expressiveness analysis

The aim of this section is to provide novel results about the expressive power of EPPs as compared to PPs (Section 6.1) and the expressiveness of the SPARQL standard in terms of $ρ df$ reasoning when considering different navigational cores (Section 6.2).

6.1. Expressive power of extended property paths vs. property paths

We now investigate the expressiveness of EPPs as compared to PPs. We use the evaluation function ${⟦ \cdot ⟧}_{G}$ to denote either the evaluation of a PP (Fig. 5) or EPP (Table 3). The semantics of the evaluation will be clear from the context. In the next theorem we prove that the language of EPPs is strictly more expressive than PPs.10

¹⁰
Even if such result could be obtained by adapting standard results about NREs, we provide, for the sake of completeness, a complete constructive proof.

By using the graph in Fig. 13, we will show that there exists an EPP pattern, which is able to distinguish between the node :b and the nodes :c and :d. The same does not hold for PPs; indeed, any PP pattern that provides :b as an answer will provide at least an additional answer (either :c or :d). The rationale behind this result is that PP patterns are not able to tell apart the conjunction of two predicates from the two predicates alone.

Theorem 24.

There exists an EPP pattern that cannot be expressed as a PP pattern .

Fig. 13.

graph used to prove Theorem 24.

Proof.

Consider the EPP pattern and the graph G in Fig. 13. We have that ${⟦ π_{1} ⟧}_{G} = {{?b \to :a, ?e \to :b}}$ . It is immediate to see that the mapping in the result derives from the evaluation of (step 1 of the + operator). Moreover, no other mappings can be obtained by evaluating further steps because of the self loops. We claim that for every PP pattern the following property holds: either ${⟦ π_{2} ⟧}_{G} = \emptyset$ or ${⟦ π_{2} ⟧}_{G}$ contains at least one mapping not belonging to ${⟦ π_{1} ⟧}_{G}$ . The proof of the theorem relies on the following claim. Claim 25.

Consider the graph G in Fig. 13 and let $Π_{self} = {{?b \to :a, ?e \to :a}, {?b \to :b, ?e \to :b}}$ . For every PP patternwe have that eitheror.

Proof.

We proceed by induction on the construction of the PP expression . We start with the base cases:

If is of the form then: (i) if u=:p or u=:q then ${⟦ ⟨ ?b, u, ?e ⟩ ⟧}_{G} ⊒ Π_{self}$ because of the self-loops at each node; (ii) otherwise ${⟦ ⟨ ?b, u, ?e ⟩ ⟧}_{G} = \emptyset$ .

If is or then: (i) if :p $\notin {u_{1}, \dots, u_{n}}$ or :q $\notin {u_{1}, \dots, u_{n}}$ then because of the self-loops present at each node; (ii) otherwise .

If is of the form then ${{⟦ ⟨ ?b,! (u_{1} | \dots | u_{j} |^{} u_{j + 1} | \dots |}^{} u_{n}) ?e ⟩ ⟧_{G} = ⟦ ⟨ ?b,! {(u_{1} | \dots | u_{j}), ?e ⟩ ⟧}_{G} \cup ⟦ ⟨ ?b,! (^{} u_{j + 1} | \dots |^{} u_{n}), ?e ⟩ ⟧}_{G}$ . Hence, the claim holds because of point c2 above.

Let

be PP expressions; assume that it holds that either: (i)

or (ii)

for

i \in {1, 2}

. We now proceed with the inductive step and consider the other types of PP expressions.

If is of the form then and the claim follows from the properties of the algebra.

If is of the form then as a consequence of the evaluation of the base step of the Kleene operator.

If is of the form then and the claim follows from the properties of the algebra.

□

By relying on Claim 25, the result follows since all the mappings in $Π_{self}$ do not belong to ${⟦ π_{1} ⟧}_{G}$ . □

Table 9

Languages and their translation into SPARQL for reasoning

Navigational core	Extended processor	Reference in the semantics	SPARQL fragment
$p \in I$	No	R1 in Fig. 5	$S^{{⋈, \cup, FILTER, PP, ALP}}$
nPP	No	R1–R5 in Fig. 5	$S^{{⋈, \cup, FILTER, PP, ALP}}$
nEPP	No	R1–R2, R5–R9, R11–R16 in Table 3	$S^{{⋈, \cup, FILTER, PP, ALP}}$
PP	Yes	Fig. 5	$S^{{⋈, \cup, FILTER, EPP, EALP}}$
EPP	Yes	Table 3	$S^{{⋈, \cup, FILTER, EPP, EALP}}$

To continue our expressiveness analysis, we now show that using EPPs as navigational core in SPARQL increases the expressive power of the language.

Theorem 26.

There exists a $S^{{⋈, \cup, EPP}}$ query that cannot be expressed as a $S^{{⋈, \cup, PP}}$ query.

Proof.

Consider the following $S^{{⋈, \cup, EPP,}}$ query: and the graph G in Fig. 13. Let us indicate by the EPP pattern in $Q_{e}$ . By evaluating $Q_{e}$ over G we obtain the set of mappings ${{?b \to :a, ?e \to :b}}$ . We will show that the query $Q_{e}$ cannot be expressed by any $S^{{⋈, \cup, PP}}$ query $\bar{Q}$ of the form $SELECT ?b ?e WHERE {P}$ , where $P$ is a pattern as defined in Section 2. We claim that for every pattern $P$ (in the fragment $S^{{⋈, \cup, PP}}$ ) the following property holds: either ${⟦ P ⟧}_{G} = \emptyset$ or ${⟦ P ⟧}_{G}$ contains at least a mapping not belonging to ${⟦ π ⟧}_{G}$ . The proof of the theorem relies on the following claim. Claim 27.

Consider the graph G in Fig. 13 and let $Π_{self} = {{?b \to :a, ?e \to :a}, {?b \to :b, ?e \to :b}}$ . For every $S^{{⋈, \cup, PP}}$ query $\bar{Q}$ of the form $SELECT ?b$ $?e WHERE {P}$ , where $P$ is a pattern as defined in Section 2 , we have that either ${⟦ P ⟧}_{G} = \emptyset$ or ${⟦ P ⟧}_{G} ⊒ Π_{self}$ .

Proof.

We prove the theorem by structural induction on the construction of the pattern $P$ built by using the constructs in the fragment $S^{{⋈, \cup, PP}}$ . Base case:

If is a single property path pattern then by virtue of Theorem 24 and Claim 25 we have that either or and thus, the property holds.

Inductive step:

Consider now the case of $P$ containing two patterns $P_{1}$ and $P_{2}$ such that either ${⟦ P_{i} ⟧}_{G} = \emptyset$ or ${⟦ P_{i} ⟧}_{G} ⊒ Π_{self}$ holds for $i \in {1, 2}$ . If $P = P_{1} AND P_{2}$ then ${{{⟦ P_{1} AND P_{2} ⟧}_{G} = ⟦ P_{1} ⟧}_{G} ⋈ ⟦ P_{2} ⟧}_{G}$ . If at least one of the two evaluations is empty then we can conclude ${⟦ P_{1} AND P_{2} ⟧}_{G} = \emptyset$ . Otherwise, by combining the inductive hypothesis and the properties of the algebra we can conclude that ${⟦ P_{1} AND P_{2} ⟧}_{G} ⊒ Π_{self}$ holds. A similar reasoning also apply if $P = P_{1} UNION P_{2}$ by considering that ${{{⟦ P_{1} UNION P_{2} ⟧}_{G} = ⟦ P_{1} ⟧}_{G} \cup ⟦ P_{2} ⟧}_{G}$ and we can conclude that ${⟦ P_{1} UNION P_{2} ⟧}_{G} = \emptyset$ if, and only if, both evaluations are empty.

□

By relying on Claim 27, the result follows since all the mappings in $Π_{self}$ do not belong to ${⟦ π ⟧}_{G}$ . □

6.2. Expressiveness for query-based reasoning

We now study the expressiveness of SPARQL in terms of ρdf reasoning when considering various navigational cores. Table 9 mimics the expressiveness study in Table 6 where the second column describes the language produced to support query-based reasoning as described in Section 5. We can notice that, in general, supporting reasoning requires a more expressive language in the rewriting. For the basic case $p \in I$ , the query must be rewritten by applying rule R6 in Table 8; this requires the usage of EPP constructs such as nesting (), (conjunction of) tests (), and closure (). Note that the closure operator is only applied to a single predicate, i.e., sp. Therefore, the ρ-enhanced EPP can be rewritten into SPARQL by using only PPs and thus evaluated using the ALP procedure, which was not involved in the evaluation under simple entailment.

Interestingly, when considering more expressive forms of navigational patterns such as non-recursive property paths (nPP), and non-recursive EPPs (nEPP), the fragment needed to capture ρdf in the translation remains the same. The situation changes when moving to navigational patterns with recursion, that is, PPs and EPPs. In this case, the current SPARQL standard is not enough expressive to capture query-based ρdf reasoning. To give an intuition for such a limitation, consider the EPP expression , where $?s, ?e \in V$ . The evaluation of π on the graph in Fig. 14 gives, among the others, the solution $μ = {?s \to :c1, ?e \to :cf}$ . This solution is obtained since in the graph there exists a path (rectangle in Fig. 14) between $:c1$ and $:cf$ of length n where each edge is a subproperty of :tr. If one were to write a SPARQL query for an arbitrary n to capture the same solution μ, the only construct that captures these kinds of queries are PPs; in particular an expression of the form $(:predicate) *$ since no other SPARQL 1.1 syntactic expression can be used to traverse an arbitrary number of edges. However, the current SPARQL standard does not allow to check that each edge belonging to the sought path is a subproperty of :tr. This result follows from Bischof et al. [12] where the authors used a similar argument to show the impossibility of SPARQL with PPs to capture owl:symmetricProperty. Note that this limitation can be dealt with by using EPPs (and EALP) and previous navigational extensions of SPARQL like NREs [43]. We leave the study of how EPPs can be coupled with the approach proposed by Bischof et al. [12] as future work.

Fig. 14.

A graph about transportations.

The interesting conclusion that can be drawn by observing Table 9 is that $S^{{⋈, \cup, FILTER, EPP, EALP}}$ is the only closed language with respect to $ρ df$ reasoning for the rewriting shown in Table 8. We point out that our rewritings into SPARQL for plain RDF and ρdf require the same expressiveness. Practically speaking substituting the current ALP procedure with the EALP procedure for EPPs would allow full EPPs support for both the plain and ρdf entailment regimes.

Fig. 15.

Set-basd semantics for EPPs.

7. iEPPs: A SPARQL-independent language

The aim of this section is to study EPPs as an independent language. The advantage of defining EPPs as a navigational language independent from SPARQL stems from the fact that the SPARQL-based semantics and translation discussed in Section 3.2 and Section 4 only apply to KGs based on RDF while the proposed language can be used to query arbitrary KGs. To this end, we give a set-based semantics in Section 7.1 and present an evaluation algorithm along with a complexity analysis in Section 7.2.

7.1. Formal semantics of EPPs based on sets

The semantics of EPPs based on sets for both recursive and non-recursive EPPs is shown in Fig. 15. It leverages two evaluation functions. The first, given an expression and a graph G returns the pairs of nodes that are linked by paths conforming to . The second , given a test , a graph G and a triple t $\in G$ , returns true if the triple satisfies the test and false otherwise. The semantics follows the same spirit of other navigational languages like NREs [43] although EPPs offer more features (e.g., path conjunction and path difference).

7.2. Evaluation algorithm

The aim of this section is to study whether the semantics in Fig. 15 can be implemented in an efficient way. In what follows we show an efficient evaluation algorithm, that has been implemented in a custom query evaluator, and discuss its complexity. The presented evaluation algorithm for iEPPs expressions is similar to those of other navigational languages such as nested regular expressions [43] and NautiLOD [20]. The algorithm starts by invoking Evaluate, which receives as input a graph G, an expression and a node n. If is non recursive (i.e., it does not contain the closure operators and ) then it is given as input to the function base, which considers the various forms of syntactic expressions. For recursive expressions the algorithm uses the function closure. Finally, the boolean function EvalTest handles the different types of .

The result of the evaluation of an iEPP expression from a node n is a set of nodes $n_{r}$ where nodes $n_{r}$ are reachable from n via paths satisfying . To study the complexity of the evaluation algorithm we introduce the decision problem Eval EPPs , which takes as input an EPP expression , a pair of nodes $(s, r)$ and a graph G and asks whether .

Theorem 28.
The Eval EPPs problem can be solved in time , where $c_{EExp}$ is the cost of evaluating built-in conditions.
Proof.
We assume G to be stored by its adjacency list. In particular, for each $q \in$ terms $(G)$ , a Hashtable is maintained where the set of keys is the set of predicates p such that there exists a triple in G having as subject q and as predicate p, and the set of values are lists of objects o reachable by traversing p-predicates from q. We assume that given q and a predicate p the set of nodes reachable can be accessed in time $O (1)$ . An additional Hashtable is used for inverse navigation, that is, for navigation starting on the object and ending on the subject. Both structures use space $O (| G |)$ . Let be the size of the iEPP expression .

The function Evaluate is recursively called on each sub-expression of the in input; if such sub-expressions are not recursive (i.e., do not contain , ), Evaluate is invoked at most times. The base cases (lines 17–21 of function Base) require to consider at most all the edges for all the nodes; this can be done in time $O (| G |)$ . If is recursive, the function Closure is executed at most $O (nodes (G))$ times; the procedure Evaluate is invoked for each node in the worst case. When evaluating a subexpression from a node we use memoization to store its result (i.e., the set of reachable nodes) thus avoiding to recompute the same expression from the same node multiple times. Memoization guarantees that the total time required by Closure is . As for nested expressions, memoization enables to mark nodes of the graph satisfying a given subexpression. Path conjunction and difference, corresponding to intersection and difference of sets of nodes respectively (line 12 and 14 of base), can be computed in time $O (| G |)$ by using a (prefect) hash function as the graph is known beforehand. As for tests, their cost is constant for logical operators and simple URI checking. The complexity is parametric with respect to the cost of other SPARQL-based built-in conditions EExp ( $c_{EExp}$ ). Finally, observe that with memoization the space complexity is . □

8. Experimental evaluation

This section reports on an experimental evaluation meant to investigate different aspects of the EPP language discussed in the previous sections. Section 8.1 investigates the performance of our translation algorithm (see Section 4) in terms of running time, and compares it with that of translations routinely performed by existing SPARQL processors. We focused on running time since it offers a reasonable summary of the overall performance of a query processing system being based on the iEPPs evaluation algorithm or SPARQL evaluation algorithm. In Section 8.2 we compare the running time of a custom processor implementing the evaluation algorithm for iEPPs (see Section 7) with the running time of the Jena ARQ SPARQL processor. Finally, in Section 8.3 we discuss the impact of using query-based reasoning (see Section 5) both in terms of running time and number of results. All the experiments have been performed on an Intel i5 machine with 8 GBs RAM. Results are the average of 30 runs (queries were run in a random order each time) after removing the top and bottom outliers.

8.1. Translation running time

Our primary objective is to make practical the immediate adoption of EPPs as a query language for KGs. This objective is fulfilled by using our translation from nEPPs to SPARQL as front end to any existing SPARQL processor. To investigate the performance of the translation algorithm presented in Section 4, we show that our nEPPs to SPARQL translation performs comparably to the existing translations routinely performed by SPARQL processors.

Fig. 16.

Time of the translation of Jena ARQ (SPARQL2Algebra) and nEPPs (nEPPS2SPARQL) vs query length (number of path steps).

We compared our translation algorithm with the SPARQL syntax to SPARQL algebra (referred to as SPARQLtoAlgebra) translation performed by ARQ.11

¹¹

http://jena.apache.org/documentation/query/algebra.html

We used 28 queries generated in two steps starting from four expressions; three base expressions (Q1–Q3) plus a fourth one combining them (Q4). Q4 includes all the nEPP constructs; concatenation, path conjunction, path difference, path test, and logical tests with all the logical operators. Second, we generate increasingly longer expressions

Q_{i}^{k}

by concatenating

Q_{i}^{(k - 1)} / Q_{i}^{(k - 1)}

, up to

k = 6

. The resulting

Q_{i}^{6}

fragments involve the concatenation of 64 path steps. The running times of the nEPPtoSPARQL and SPARQLtoAlgebra translations, for each query, are shown in Fig. 16. Our translation performs similarly (slightly faster) than ARQ’s existing initial phase, and this behavior shows a consistent trend in two dimensions (

Q_{i}^{k}

expressions use more EPP constructs for increasing i, and become exponentially longer for increasing k). To give a sense of the length of the expressions, we observe that

Q_{4}^{6}

is a 19K character long nEPPs expression (with an operational tree containing over one thousand nodes), while the

Q_{4}^{6}

SPARQL translation is 133K characters long after filter elimination (the original translation is ∼239K characters).

While this suggests that the cost of our approach could be up to twice the cost of a direct nEPPs to algebra translation, keep in mind that we are comparing initial phases of query processing and these are typically much faster than subsequent phases. As an example, in Jena ARQ the SPARQLtoAlgebra translation is followed by an algebra to algebra optimization phase [23]. The remaining pre-processing phases (particularly those using dataset statistics) can be far more expensive than this initial phase. To give another example, if we consider Virtuoso, we observe that the initial SPARQL to SQL translation phase is followed by a more expensive cost based SQL optimization phase. Hence, the impact of our translation on the running time is negligible as compared to the total running time and other kinds of translations routinely performed by SPARQL processors.

8.2. Running time of iEPPs vs. Jena ARQ

We now compare the running time of the custom EPP processor implementing the iEPPs evaluation algorithm discussed in Section 7.2 against the translation-based approach described in Section 4 using Jena ARQ as underlying SPARQL query processor. This experiment gives insights about the pros and cons of evaluating EPPs using existing SPARQL processors as compared to the usage of the iEPPs custom query processor. In the experiments, we used a portion of the FOAF dataset extracted from the BTC2012 dataset12

¹²
http://km.aifb.kit.edu/projects/btc-2012

as follows: we started from the URI of T. Berners-Lee (TBL) and traversed foaf:knows links up to distance 4. Starting from a seed URI allowed to obtain a connected graph. On one hand, this graph comprising ∼4M triples is suitable for loading into main memory as the iEPPs processor adopts an in-memory algorithm. On the other hand, having a graph with a few edge types (mainly foaf:knows) allows to write intuitive expressions having different levels of complexity and using all the features of our language (i.e., nesting, path conjunction and path difference).

Fig. 17.

Query time for simple and $ρ df$ -entailment comparing iEPPs and Jena ARQ.

We created 4 groups $Q_{i}$ , $i \in {1, \dots, 4}$ of nEPP expressions each with 3 queries for a total of 12 queries (Q1–Q12) that are reported in Appendix B. The first group makes use of concatenation () and path alternatives (); the second group also includes nesting (); the third group includes path difference () and concatenation (); finally, the fourth group leverages path conjunction () and concatenation (). These groups of queries allow to investigate the trade-off between expressiveness and running time. Indeed, one expects that queries in $Q_{1}$ are less expensive than queries in $Q_{3}$ . For each $epp \in Q$ we generated the corresponding SPARQL query $S_{epp}$ via the translation algorithm. To investigate the performance also when including the query-based reasoning capabilities discussed in Section 5, we translated each epp into another query ${epp}^{ρ}$ and each $S_{epp}$ into another query $S_{epp}^{ρ}$ . At this point, the original query epp and its reasoning-aware variant ${epp}^{ρ}$ are evaluated via the iEPPs custom processor while the translated $S_{epp}$ query and its reasoning-aware variant $S_{epp}^{ρ}$ are evaluated via Jena ARQ. Figure 17 (left) shows the comparison when executing the queries without considering reasoning capabilities (i.e., under the simple entailment). Figure 17 (right) shows results using $ρ df$ .

For $Q_{1}$ , which contains queries asking for friends of TBL at distance 1, 2 and 3, the iEPPs processor performs better than Jena at distance 1 and 2; at distance 3, the times are comparable. $Q_{2}$ additionally considers a test based on nesting. Again, the custom processor performs better at distance 1 and 2; at distance 3 it shows a higher running time. In $Q_{3}$ , which considers path difference (i.e., exclusive friends at various distances) the iEPPs processor performs consistently better. Finally, in $Q_{4}$ that includes conjunction (to ask for mutual friends at various distances) the iEPPs processor performs better at distance 1 and 2 and obtains a higher running time at distance 3. These experiments suggest that for real-world data and natural queries (e.g., mutual friends) working with SPARQL-translated nEPPs and using existing processors (Jena in this case) is a bit less efficient than using the a custom processor. Note that the iEPPs processor works in memory similarly to nSPARQL and other SPARQL navigational extensions. This clearly limits the applicability of these approaches on real-world graphs that typically do not fit into main memory; it also underlines the advantage to adopt our rewriting approach into SPARQL queries that can be evaluated on existing SPARQL processors capable of handling large graphs. The number of results ranges from ∼50 to ∼8000 for the simple entailment and from ∼150 to ∼14500 for the $ρ df$ entailment, respectively.

The huge advantage of using nEPPs is that navigational queries can be written in a succinct way. Anecdotally, while the nEPPs asking for mutual friends (simple entailment) at distance 3 contains ∼200 characters, the SPARQL query (obtained from the translation) contains ∼700 characters; moreover, writing navigational queries directly in SPARQL requires to deal with a large number of variables that need to be consistently joined. We want to point out that the translated SPARQL queries have been automatically generated. It may be the case that manually written equivalent queries can be shorter. Nevertheless, there are cases in which the EPPs syntax always introduces benefits (beyond those already introduce by PPs). As an example, path repetitions (used e.g., in Q5–Q12) available in EPPs (and not in PPs) always allow a significant reduction in the expression size. Indeed, the conversion of path repetitions into PPs requires to use alternative paths having an increasing number of concatenation operators. As an example, $p1 {1, 3}$ , requires three path alternatives for a total of three concatenations.

8.3. Running time of query-based reasoning

We now move to a larger scale evaluation of the query-based reasoning approach described in Section 5. The goal is to compare the running time of queries with and without reasoning support. Even in this case we considered running time since it offers a reasonable summary of the overall performance of a query processing system. We also investigate the number of results returned. Among the ρdf inference rules (see Table 8) we considered the two most interesting, that is, R5 that allows to derive new rdf:type information and R6 that allows the derivation of generic (sub)properties. Deriving new rdf:type information is particularly useful in efficient query processing via type-aware graph transformations [31]. The other rules in Table 8 either derive schema information (e.g., R3–R4) or can be captured via PPs (e.g., R1). For simple RDF, each query was executed as it is. Under the $ρ df$ entailment, each query was first rewritten as described in Section 5.2. The prototypical EPP expression used in this experiments has the form: $\begin{matrix} seed_entity prop ?y \end{matrix}$ where $prop \in {rdf:type, dbo:genre, dbo:location, yago:hasLocation}$ . To give an example, the EPP dbp:Tracy_Mann rdf:type ?y retrieves asserted RDF types for the entity Tracy Mann. When rewriting this query we could also get inferred RDF types. We tested the performance of the query-based reasoning approach featured by EPPs on a variety of datasets and SPARQL processors (both local and remote) as shown in Table 10.

Fig. 18.

Query time (y-axis) for simple and $ρ df$ entailment over different datasets. The x-axis shows query IDs. Average number of results reported in Table 11.

Table 10

Datasets used for the evaluation of query-based reasoning

Dataset	Triples	Availability
LinkedMDB a	6M	local SPARQL endpoint
Yago b	400M	local SPARQL endpoint
DBpedia	412M	remote SPARQL endpoint c
LDCache	22B	remote SPARQL endpoint d

http://linkedmdb.org

www.mpi-inf.mpg.de/yago

http://dbpedia.org/snorql

http://lod.openlinksw.com/sparql

DBpedia is a large dataset with limited RDFS usage, Yago/LDCache makes extensive usage of RDFS predicates while LinkedMDB does not use RDFS. LinkedMDB and Yago have been loaded into a BlazeGraph13

¹³

https://www.blazegraph.com/download

instance while DBpedia and LDCache have been accessed via their Virtuoso14

¹⁴

http://virtuoso.openlinksw.com

SPARQL endpoints.

Figure 18(a), (c), (f) report the running times on the RDFS rule R5 on 50 different queries that count the number of results by randomly picking 50 entities in Yago, DBpedia, and LinkedMDB, respectively. Detailed results are available in Appendix A. We observe that the additional time introduced by the query-based reasoning approach is reasonable and there are a few exceptions (in DBpedia) where plain RDF query execution takes more time. As expected, there is some variation in DBpedia while the additional time is much larger in Yago. Note that query answering under the entailment regime in some cases takes less time; this can be explained by the fact that it requires the usage of the ALP procedure that may perform better than the standard evaluation technique in some cases. To show that even without additional inference the additional cost of the query-based reasoning translation is minimal, we tested R5 also on LinkedMDB (that does not have a schema). The advantage of using the entailment regime is evident when looking at the average number of results, reported in Table 11. As an example, on DBpedia it increases from 13 to 27. The average ratio in terms of time for all queries is 1.7, 11.3, and 1.7 for DBpedia, Yago and LinkedMDB, respectively. As expected, the larger the ratio the larger the number of results.

Table 11

Average number of results for plain RDF and ρdf

Dataset	Plain RDF	ρdf
Fig. 18(a)	5.08	22.88
Fig. 18(b)	0	1.43
Fig. 18(c)	13.53	27.04
Fig. 18(d)	0	2.18
Fig. 18(e)	0	2.12
Fig. 18(f)	1.4	1.4

Figure 18(b), (d), and (e) further investigate the benefit of query-based reasoning. We created 150 additional queries for R6; 100 for DBpedia by considering two properties, that is, dbpo:genre and dbpo:location, and 50 for LDCache by picking the property yago:hasLocation (note that we used a property from Yago schema since it is contained in LDCache). By looking at R5 in Table 8 it can be noted that the translation of an EPP under the entailment regime requires the union of three queries; hence, the resulting EPP is translated into a SPARQL query using (three) UNION operators. On the other hand, the translation of an EPP to capture R6 requires a single query that will be translated in SPARQL using FILTER (to capture tests).

In other words, queries using R5 are more involved than those using R6. R6’s impact in terms of running time is lower than R5; this also reflects on the average speed-up now is 1.18 for dbpo:genre, 1.46 for dbpo:location and 1.15 for yago:hasLocation. By looking at the average number of results (Table 11) it can be observed that plain RDF did not provide any result while our query-based reasoning approach allowed to get results. To be more specific, in DBpedia results have been obtained not via the property dbpo:genre, but via the more general property dbpo:literaryGenre. This allowed to discover, for instance, that Night Surf (one seed entity) is a post-apocalyptic short story. In LDCache, while the query returned zero results when using yago:hasLocation, it returned results via the more general property yago:placedIn (via R6).

Comparison with Closure Computation. An additional advantage of the query-based approach is that it can benefit from space optimization if one would work with the transitive reduction15

¹⁵

Tools like Slib (https://github.com/sharispe/slib) can compute the reduction of RDF graphs.

of a graph [2] that removes edges derivable from

ρ df

-reasoning. In contrast, if one wants to precompute the closure (the currently used approach) one would need to materialize the full closure of the RDF graph under consideration, which would require cubic space in the worst case [25]. This become prohibitive for large KGs like DBPedia, Yago and many other. Indeed, we did measure in a local copy of (a subset of) Yago the space and the time of the closure. Starting from 400M triples the closure doubled the number of triples (giving 853M triples) and took 3.5 h of computation.

9. Related work

The idea of graph query languages is to use (variants of) regular expressions to express (in a compact way) navigational patterns (e.g., [10,14,15,40]). Angles and Gutierrez [5], and Wood [54] provide surveys on the topic, Barceló provides a detailed overview of research in the field [9] while Angles et al. [4] describe a recent proposal. Our goal with EPPs is to extend the navigational core of SPARQL (i.e., PPs) and make the extension readily available for existing SPARQL processors.

9.1. SPARQL navigational extensions

Proposals to extend SPARQL with navigational features have been around for some time. Notable examples are PSPARQL [3] and nSPARQL [43] that tackled this problem even before the standardization of property paths (PPs) as SPARQL navigational core. From the practical point of view, the need for RDF navigational languages is witnessed by projects like Apache Marmotta16

¹⁶
http://marmotta.apache.org

that incorporates a simple navigational language that borrows ideas from XPath. Since our main goal is to extend the navigational core of SPARQL we focus on the comparison between EPPs and other SPARQL navigational extensions. We compare EPPs with PPs, cpSPARQL [3], rec-SPARQL [46], RDFPath [45], nSPARQL-NREs [43], and star-free Nested Regular Expressions (sfNREs) that extends NREs with negation [57]. Table 12 summarizes the results of the comparison; we considered the following language features: path conjunction (

), path difference (

), negation of tests (

), nesting (

), tests over nodes (

), usage of positions (

), path repetitions ({l,h}), entailment regime, and closure operator (

). Additionally, we consider how expressions in each of the languages are evaluated, the support for reasoning (we focus on RDFS and in particular the ρdf fragment [41]) and the support for query-based reasoning (QBR); finally, we also report whether the language is implemented.

Table 12

Comparison of EPPs with other navigational extensions of SPARQL

RDFPath is more focused on specific types of queries (i.e., shortest paths) and their efficient implementation in MapReduce and it has fewer features than all the other languages considered. Path conjunction/difference are natively supported only by EPPs and sfNREs while nSPARQL, cpSPARQL and rec-SPARQL require the usage of the SPARQL algebra (i.e., for conjunction). Nevertheless, this does not allow to use path conjunction inside the closure operator where the number path conjunction evaluations is apriori not bound. As a side note, we also mention that queries that resort to the SPARQL algebra for conjunction are more verbose. Finally, nSPARQL, cpSPARQL and rec-SPARQL do not support path difference. Test negation () is only supported by PPs (e.g., via negated property sets) and EPPs; nesting is supported by all languages except PPs and rec-SPARQL. However, only EPPs allow to test node values in a nested expression (see Example 6). Node tests are supported in limited form by cpSPARQL; EPPs allow logical combination of tests representing nesting and tests representing (in)equalities of node values. As a matter of fact, none of these extensions can express the Italian exclusive friends query mentioned in the Introduction. EPPs support path repetitions; this feature (called curly brace form) is in the agenda of the SPARQL working group.17

¹⁷

http://www.w3.org/2009/sparql/wiki/Future_Work_Items

rec-SPARQL also supports repetitions of more verbose queries since the motivation behind rec-SPARQL is not to provide a concise syntax. Nevertheless, rec-SPARQL requires an ad-hoc query processor.

A crucial difference between EPPs and related research is that we tackle the problem of extending the SPARQL language in the least intrusive way. We show that there exists a precise fragment of SPARQL that is expressive enough to capture non recursive EPPs (nEPPs), that is, EPPs that do not use closure operators (i.e., * and $^{+}$ ). Therefore, following the same line of the SPARQL standard where non-recursive PPs are translated into SPARQL queries, we devised a translation from (concise) nEPPs into (more verbose) SPARQL queries. The advantage of this approach with respect to previous navigational extensions of SPARQL (e.g., [3,43,56]) that require the usage of ad-hoc query processors is that nEPPs can be evaluated on existing SPARQL processors.

Reasoning is not supported by PPs, sfNREs, RDFPath, and rec-SPARQL. Along the same line of NREs (and nSPARQL) and cpSPARQL, we focus on how EPPs can support SPARQL queries with embedded reasoning capabilities [23]. We focus on the $ρ df$ fragment [41], which captures the main semantic functionalities of RDFS. We show that certain classes of SPARQL queries can be rewritten into queries that capture $ρ df$ semantic functionalities, and thus can be evaluated on existing SPARQL processors. This is again a significant advantage as compared to previous attempts (e.g., nSPARQL [43]) that require ad-hoc processors.

Another difference with related proposals concerns the implementation of the language. To foster the adoption of EPPs and show its feasibility, we make EPPs available to users and developers in different forms: (i) as an implementation independent from SPARQL; (ii) as a front-end to SPARQL endpoints (for nEPPs) and (iii) as an extension to the Jena library. Further information along with pointers to the source code is available on the EPPs’s website.18

¹⁸

http://extendedpps.wordpress.com

Finally, our study includes two novel expressiveness aspects. The first concerns the expressive power of the current SPARQL standard in terms of navigational features (see Section 6). We show that the language of EPPs is more expressive than SPARQL PPs; as a by-product we show that using EPPs as navigational core in SPARQL increases the expressive power of the whole SPARQL language. The second aspect concerns the expressiveness of SPARQL also in terms of query-based reasoning capabilities when considering the $ρ df$ entailment regime (see Section 5). We show that our translation allows to evaluate queries enhanced with reasoning capabilities on existing SPARQL processors. We also show that EPPs is the only closed language in this respect and that in general, rewriting a query to capture the entailment regime requires a more expressive language in the rewriting.

9.2. Other navigational languages

Besides SPARQL navigational extensions there exist other graph languages like GraphQL [28] the Facebook query language and GuLP [18]. However, these languages depart from the SPARQL standard and it is not clear how reasoning is supported. We also mention logic-based languages like TriAL [37], TriQ [8], GXPath [36], and NEMODEQ [47]. Even if some of these languages (e.g., GXPath) are expressive enough to encode EPP expressions, they depart from the SPARQL standard meaning that query evaluation cannot be done on existing SPARQL processors. On the contrary, our primary focus is on extending the current navigational core of the SPARQL standard by keeping compatibility and allowing query evaluation on existing SPARQL processors also under the ρdf entailment regime. Indeed, none of the above proposals has focused on the expressiveness of the current SPARQL standard in terms of navigational features. Ditto for the support of the $ρ df$ entailment regime on existing SPARQL processors. We also mention work on graphs with data (e.g., [35]). This line of research: (i) does not adopt the RDF standard data model; (ii) does not consider SPARQL, which is the focus of this paper; (iii) does not deal with entailment regimes. Our work is also related to: (i) Ontology Based Data Access [32], where a (conjunctive) query is rewritten into a (set of) queries that fully incorporate the schema information. In this case the schema is treated separately and is needed in the rewriting; (ii) approaches that rewrite queries to capture entailment regimes like Bischof et al. [12]; (iii) approaches independent from SPARQL such as Stefanoni et al. [50] that study conjunctive and navigational queries over OWL 2 EL. Another recent line of research studied the problem of introducing recursion into SPARQL [46]. Our approach has different objectives. We focus on EPPs, a more expressive language than PPs; we provide a precise account of those fragments that can be executed on existing SPARQL processors and those that cannot, with or without considering the (ρdf) entailment regime. Hence, our study is more focused on expressiveness with respect to SPARQL. Moreover, our approach is readily available and has been experimentally evaluated. The comparison with navigational languages for the Web of data (e.g., [1,20,27,29,48]) is orthogonal to our goal. We also want to mention recent research that studied problems related to SPARQL property paths, including containment and subsumption [33]. We performed a similar study for EPPs. Results range from undecidability for the full EPPs to 2-EXPTIME-hard for the positive queries [39]. Finally, we want to mention two recent papers that analyze the relative expressive power of different combinations of operators for navigational query languages on graphs, either with [52] or without [21] transitive closure.

9.3. CONSTRUCT query forms

Reutter et al. [46] proposed to enhance the expressive power of SPARQL via the introduction of recursions in a similar way to SQL. The idea is to alternate CONSTRUCT queries (that materialize in a graph the portion of data needed in each recursive call) and SELECT queries to project only parts of interest. This approach, which is currently not available in standard SPARQL implementations could be used to materialize the portion of the graph needed to capture RDFS inferences. Both data materialization and changes required to SPARQL processors (to support recursion) go against the idea of EPPs that provide expressive SPARQL navigational queries (also under the ρdf entailment regime) with no materialization and no changes to existing SPARQL processors.

10. Concluding remarks

We introduced EPPs, a significant extension of property paths, the current navigational core of SPARQL, the standard query language for querying KGs based on RDF. We underlined several practical advantages of adopting such an extension. Our study also offers interesting theoretical observations, among which: (i) we identified a precise fragment of SPARQL that can capture non-recursive EPPs thus providing an indirect analysis of the navigational expressiveness of SPARQL; (iii) we have studied the expressiveness of EPPs as compared to PPs; (iii) we have also studied the expressiveness of SPARQL with respect to the $ρ df$ entailment regime when considering different navigational cores, and identified those that can be supported on existing processors and those that require changes. Overall, we think that the practical and theoretical contributions of our work can help pave the way toward extending the navigational core of SPARQL and incorporate query-based reasoning capabilities. A promising direction of future work is to study how optimization techniques devised for SPARQL property paths [55] can be applied to extended property paths.

Footnotes

Detailed experiments in Section 8.3

Table 18

(Continued)

Seed entity	QId	Result count		Time (ms)

		No reasoning	$ρ df$	No reasoning	$ρ df$
yago:Acheron_Boys_Home	Q24	0	1	106.18	110.22
yago:Acheron,_Victoria	Q25	0	5	96.06	115.61
yago:Achimota_School	Q26	0	2	120.28	128.43
yago:Acme,_Washington	Q27	0	2	106.37	121.53
yago:Acquaviva_Picena	Q28	0	1	102.37	115.10
yago:AD_Torreforta	Q29	0	1	108.07	123.47
yago:Ada,_Croatia	Q30	0	3	91.96	113.06
yago:Adabay_River	Q31	0	1	112.62	120.61
yago:Adaganahalli	Q32	0	1	91.23	118.95
yago:Adair,_Idaho	Q33	0	3	111.64	114.51
yago:Adak_Airport	Q34	0	1	111.93	122.05
yago:Adak,_Alaska	Q35	0	3	91.31	116.07
yago:Adakanahalli	Q36	0	1	105.41	121.67
yago:Adakatahalli	Q37	0	1	108.14	99.44
yago:Adalin_River	Q38	0	2	102.34	107.79
yago:Adam_&_Steve	Q39	0	1	100.46	111.54
yago:Adam_Airport	Q40	0	1	97.71	102.69
yago:Adam_Orris_House	Q41	0	1	108.72	104.87
yago:Adam’s_Green	Q42	0	2	93.65	111.36
yago:AdOn_Network	Q43	0	1	114.73	128.12
yago:AFI_Conservatory	Q44	0	2	110.62	123.65
yago:ALZ_(steelworks)	Q45	0	1	98.44	110.07
yago:APSA_Colombia	Q46	0	1	92.99	121.10
yago:ASFA_Soccer_League	Q47	0	1	106.14	128.73
yago:ASTM_International	Q48	0	2	103.31	125.01
yago:ATP_Challenger_Guangzhou	Q49	0	1	108.21	111.38
yago:ATP_Challenger_La_Serena	Q50	0	1	122.51	113.77

Queries of experiments in Section 8.2

Table 19

Queries used in Section 8.2

Query ID
Q1	< http://xmlns.com/foaf/0.1/knows >
Q2	< http://xmlns.com/foaf/0.1/knows >{1,2}
Q3	< http://xmlns.com/foaf/0.1/knows >{1,3}
Q4	< http://xmlns.com/foaf/0.1/knows >&&TP(_o,< http://xmlns.com/foaf/0.1/homepage >)
Q5	(< http://xmlns.com/foaf/0.1/knows >&&TP(_o,< http://xmlns.com/foaf/0.1/homepage >)){1,2}
Q6	(< http://xmlns.com/foaf/0.1/knows >&&TP(_o,< http://xmlns.com/foaf/0.1/homepage >)){1,3}
Q7	< http://xmlns.com/foaf/0.1/knows >∼(< http://xmlns.com/foaf/0.1/knows >{2,2})
Q8	(< http://xmlns.com/foaf/0.1/knows >{2,2})∼(< http://xmlns.com/foaf/0.1/knows >{3,3})
Q9	< http://xmlns.com/foaf/0.1/knows >∼(< http://xmlns.com/foaf/0.1/knows >{4,4})
Q10	< http://xmlns.com/foaf/0.1/knows >&(< http://xmlns.com/foaf/0.1/knows >{2,2})
Q11	(< http://xmlns.com/foaf/0.1/knows >{2,2})&(< http://xmlns.com/foaf/0.1/knows >{3,3})
Q12	< http://xmlns.com/foaf/0.1/knows >&(< http://xmlns.com/foaf/0.1/knows >{4,4})

References

Acosta and

M.-E.

Vidal, Networks of linked data eddies: An adaptive web query processing engine for RDF data, in: Proceedings, Part I, The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11–15, 2015,

Arenas,

Ó.

Corcho,

Simperl,

Strohmaier,

d’Aquin,

Srinivas,

P.T.

Groth,

Dumontier,

Heflin,

Thirunarayan and

Staab, eds, Springer, 2015, pp. 111–127. doi:10.1007/978-3-319-25007-6_7.

A.V.

Aho,

M.R.

Garey and

J.D.

Ullman, The transitive reduction of a directed graph, SIAM Journal on Computing1(2) (1972), 131–137. doi:10.1137/0201008.

Alkhateeb and

Euzenat, Constrained regular expressions for answering RDF-path queries modulo RDFS, International Journal of Web Information Systems10(1) (2014), 24–50. doi:10.1108/IJWIS-05-2013-0013.

Angles,

Arenas,

G.H.

Fletcher,

Gutierrez,

Lindaaker,

Paradies,

Plantikow,

Sequeda,

van Rest,

Voigtet al., G-CORE: A core for future graph query languages, in: Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10–15, 2018,

Das,

C.M.

Jermaine and

P.A.

Bernstein, eds, ACM, 2018, pp. 1421–1432. doi:10.1145/3183713.3190654.

Angles and

Gutierrez, Survey of Graph Database Models, ACM Computing Surveys40(1) (2008), 1. doi:10.1145/1322432.1322433.

Angles and

Gutierrez, The multiset semantics of SPARQL patterns, in: Proceedings, Part I, The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016,

P.T.

Groth,

Simperl,

A.J.G.

Gray,

Sabou,

Krötzsch,

Lécué,

Flöck and

Gil, eds, Springer, 2016, pp. 20–36. doi:10.1007/978-3-319-46523-4_2.

Arenas,

Conca and

Pérez, Counting Beyond a Yottabyte, or how SPARQL 1.1 Property Paths will Prevent Adoption of the Standard, in: Proceedings of the 21st World Wide Web Conference 2012, WWW, 2012, Lyon, France, April 16–20, 2012,

Mille,

F.L.

Gandon,

Misselis,

Rabinovich and

Staab, eds, ACM, 2012, pp. 629–638. doi:10.1145/2187836.2187922.

Arenas,

Gottlob and

Pieris, Expressive languages for querying the semantic web, in: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’14, Snowbird, UT, USA, June 22–27, 2014,

Hull and

Grohe, eds, ACM, 2014, pp. 14–26. doi:10.1145/2594538.2594555.

Barceló, Querying graph databases, in: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, New York, NY, USA, June 22–27, 2013,

Hull and

Fan, eds, ACM, 2013, pp. 175–188. doi:10.1145/2463664.2465216.

10.

Barceló,

Libkin,

A.W.

Lin and

P.T.

Wood, Expressive Languages for Path Queries over Graph-Structured Data, ACM Transactions on Database Systems37(4) (2012), 31. doi:10.1145/2389241.2389250.

11.

Berglund,

Boag,

Chamberlin,

M.F.

Fernández,

Kay,

Robie and

Siméon, XML Path Language (XPath) 2.0 (Second Edition), W3C, 2010. http://www.w3.org/TR/2007/REC-xpath20-20070123/.

12.

Bischof,

Krötzsch,

Polleres and

Rudolph, Schema-agnostic query rewriting in SPARQL 1.1, in: Proceedings, Part I, The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23,

Mika,

Tudorache,

Bernstein,

Welty,

C.A.

Knoblock,

Vrandecic,

P.T.

Groth,

N.F.

Noy,

Janowicz and

C.A.

Goble, eds, Springer, 2014, pp. 584–600. doi:10.1007/978-3-319-11964-9_37.

13.

Bizer,

Lehmann,

Kobilarov,

Auer,

Becker,

Cyganiak and

Hellmann, DBpedia – a crystallization point for the web of data, Web Semantics: science, services and agents on the world wide web7(3) (2009), 154–165. doi:10.1016/j.websem.2009.07.002.

14.

Calvanese,

De Giacomo and

Lenzerini, Conjunctive Query Containment and Answering under Description Logic Constraints, ACM Transactions on Computational Logic9(3) (2008), 22. doi:10.1145/1352582.1352590.

15.

M.P.

Consens and

A.O.

Mendelzon, GraphLog: A Visual Formalism for Real Life Recursion, in: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Nashville, Tennessee, USA, April 2–4, 1990,

D.J.

Rosenkrantz and

Sagiv, eds, ACM Press, 1990, pp. 404–416. doi:10.1145/298514.298591.

16.

Cyganiak,

Wood and

Lanthaler, RDF 1.1 Concepts and Abstract Syntax, W3C, p. 2014, http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

17.

Facebook Graph, https://www.facebook.com/about/graphsearch.

18.

Fionda and

Pirrò, Querying Graphs with Preferences, in: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27–November 1, 2013,

He,

Iyengar,

Nejdl,

Pei and

Rastogi, eds, ACM, 2013, pp. 929–938.

19.

Fionda,

Pirrò and

M.P.

Consens, Extended property paths: Writing more SPARQL queries in a succinct way, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 102–108.

20.

Fionda,

Pirrò and

Gutierrez, NautiLOD: A formal language for the web of data graph, ACM Transactions on the Web9(1) (2015), 5–43. doi:10.1145/2697393.

21.

G.H.L.

Fletcher,

Gyssens,

Leinders,

Surinx,

J.V.

den Bussche,

D.V.

Gucht,

Vansummeren and

Wu, Relative expressive power of navigational querying on graphs, Information Sciences298 (2015), 390–406. doi:10.1016/j.ins.2014.11.031.

22.

Franconi,

Gutierrez,

Mosca,

Pirrò and

Rosati, The logic of extensional RDFS, in: Proceedings, Part I, The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21–25, 2013,

Alani,

Kagal,

Fokoue,

P.T.

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz, eds, Springer, 2013, pp. 101–116. doi:10.1007/978-3-642-41335-3_7.

23.

Glimm, Using SPARQL with RDFS and OWL entailment, in: Reasoning Web. Semantic Technologies for the Web of Data, 7th International Summer School 2011, Tutorial Lectures, Galway, Ireland, August 23–27, 2011,

Polleres,

d’Amato,

Arenas,

Handschuh,

Kroner,

Ossowski and

P.F.

Patel-Schneider, eds, Springer, 2011, pp. 137–201. doi:10.1007/978-3-642-23032-5_3.

24.

Google Knowledge Graph, http://www.google.com/insidesearch/features/search/knowledge.html.

25.

Gutierrez,

Hurtado and

A.O.

Mendelzon, Foundations of semantic web databases, in: Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Paris, France, June 14–16, 2004,

Beeri and

Deutsch, eds, ACM, 2004, pp. 95–106. doi:10.1145/1055558.1055573.

26.

Harris and

Seaborne, SPARQL 1.1 Query Language, Vol. W3C, 2013, https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

27.

Hartig and

Pérez, LDQL: A query language for the web of linked data, Journal of Web Semantics41 (2016), 9–29, https://doi.org/10.1016/j.websem.2016.10.001. doi:10.1016/j.websem.2016.10.001.

28.

Hartig and

Pérez, Semantics and complexity of GraphQL, in: Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW, 2018, Lyon, France, April 23–27, 2018,

Champin,

F.L.

Gandon,

Lalmas and

P.G.

Ipeirotis, eds, ACM, 2018, pp. 1155–1164. doi:10.1145/3178876.3186014.

29.

Hartig and

Pirrò, A context-based semantics for SPARQL property paths over the web, in: The Semantic Web. Latest Advances and New Domains – 12th European Semantic Web Conference, ESWC 2015, Proceedings, Portoroz, Slovenia, May 31–June 4, 2015,

Gandon,

Sabou,

Sack,

d’Amato,

Cudré-Mauroux and

Zimmermann, eds, Springer, 2015, pp. 71–87. doi:10.1007/978-3-319-18818-8_5.

30.

Heath and

Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011. doi:10.2200/S00334ED1V01Y201102WBE001.

31.

Kim,

Shin,

W.-S.

Han,

Hong and

Chafi, Taming subgraph isomorphism for RDF query processing, in: Proceedings of VLDB Endowment, Vol. 8, 2015, pp. 1238–1249. doi:10.14778/2809974.2809985.

32.

Kontchakov,

Rezk,

Rodríguez-Muro,

Xiao and

Zakharyaschev, Answering SPARQL queries over databases under OWL 2 QL entailment regime, in: Proceedings, Part I, The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23,

Mika,

Tudorache,

Bernstein,

Welty,

C.A.

Knoblock,

Vrandecic,

P.T.

Groth,

N.F.

Noy,

Janowicz and

C.A.

Goble, eds, Springer, 2014, pp. 552–567. doi:10.1007/978-3-319-11964-9_35.

33.

E.V.

Kostylev,

J.L.

Reutter,

Romero and

Vrgoc, SPARQL with property paths, in: Proceedings, Part I, The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11–15, 2015,

Arenas,

Ó.

Corcho,

Simperl,

Strohmaier,

d’Aquin,

Srinivas,

P.T.

Groth,

Dumontier,

Heflin,

Thirunarayan and

Staab, eds, Springer, 2015, pp. 3–18. doi:10.1007/978-3-319-25007-6_1.

34.

Krötzsch and

Weikum, Special issue on knowledge graphs, Journal of Web Semantics37–38 (2016), 53–54. doi:10.1016/j.websem.2016.04.002.

35.

Libkin,

Martens and

Vrgoč, Querying graph databases with XPath, in: Joint 2013 EDBT/ICDT Conferences, ICDT ’13 Proceedings, Genoa, Italy, March 18–22, 2013,

Tan,

Guerrini,

Catania and

Gounaris, eds, ACM, 2013, pp. 129–140.

36.

Libkin,

Martens and

Vrgoc, Querying graphs with data, Journal of the ACM63(2) (2016), 14–11453. doi:10.1145/2850413.

37.

Libkin,

Reutter and

Vrgoč, TriaL for RDF: Adapting graph query languages for RDF data, in: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, June 22–27, 2013,

Hull and

Fan, eds, ACM, 2013, pp. 201–212. doi:10.1145/2463664.2465226.

38.

Losemann and

Martens, The complexity of evaluating path expressions in SPARQL, in: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20–24, 2012,

Benedikt,

Krötzsch and

Lenzerini, eds, ACM, 2012, pp. 101–112. doi:10.1145/2213556.2213573.

39.

W.C.

Melisachew and

Pirrò, Containment of expressive SPARQL navigational queries, in: Proceedings, Part I, The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016,

P.T.

Groth,

Simperl,

A.J.G.

Gray,

Sabou,

Krötzsch,

Lécué,

Flöck and

Gil, eds, Springer, 2016, pp. 86–101. doi:10.1007/978-3-319-46523-4_6.

40.

Mendelzon and

P.T.

Wood, Finding Regular Simple Paths in Graph Databases, SIAM Journal on Computing24(6) (1995). doi:10.1137/S009753979122370X.

41.

Muñoz,

Pérez and

Gutierrez, Simple and efficient minimal RDFS, Journal of Web Semantics7(3) (2009), 220–234. doi:10.1016/j.websem.2009.07.003.

42.

Pérez,

Arenas and

Gutierrez, Semantics and Complexity of SPARQL, ACM Transactions on Database Systems34(3) (2009). doi:10.1145/1567274.1567278.

43.

Pérez,

Arenas and

Gutierrez, nSPARQL: A Navigational Language for RDF, Journal of Web Semantics8(4) (2010). doi:10.1016/j.websem.2010.01.002.

44.

Prud’hommeaux,

Harris and

Seaborne, SPARQL 1.1 Query Language, 2013, http://www.w3.org/TR/sparql11-query.

45.

Przyjaciel-Zablocki,

Schätzle,

Hornung and

Lausen, RDFPath: Path query processing on large RDF graphs with MapReduce, in: The Semantic Web: ESWC 2011 Workshops – ESWC 2011 Workshops, Heraklion, Greece, May 29–30, 2011, Revised Selected Papers,

Garcia-Castro,

Fensel and

Antoniou, eds, Springer, 2011, pp. 50–64. doi:10.1007/978-3-642-25953-1_5.

46.

J.L.

Reutter,

Soto and

Vrgoc, Recursion in SPARQL, in: Proceedings, Part I, The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11–15, 2015

Arenas,

Ó.

Corcho,

Simperl,

Strohmaier,

d’Aquin,

Srinivas,

P.T.

Groth,

Dumontier,

Heflin,

Thirunarayan and

Staab, eds, Springer, 2015, pp. 19–35. doi:10.1007/978-3-319-25007-6_2.

47.

Rudolph and

Krötzsch, Flag & check: Data access with monadically defined queries, in: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, June 22–27, 2013,

Hull and

Fan, eds, ACM, 2013, pp. 151–162.

48.

Schaffert,

Bauer,

Kurz,

Dorschel,

Glachs and

Fernandez, The linked media framework: Integrating and interlinking enterprise media content and data, in: I-SEMANTICS 2012 – 8th International Conference on Semantic Systems, I-SEMANTICS ’12, Graz, Austria, September 5–7, 2012,

Presutti and

H.S.

Pinto, eds, ACM, 2012, pp. 25–32. doi:10.1145/2362499.2362504.

49.

Schmidt,

Meier and

Lausen, Foundations of SPARQL query optimization, in: Proceedings, Database Theory – ICDT 2010, 13th International Conference, Lausanne, Switzerland, March 23–25, 2010,

Segoufin, ed., ACM, 2010, pp. 4–33. doi:10.1145/1804669.1804675.

50.

Stefanoni,

Motik,

Krötzsch and

Rudolph, The complexity of answering conjunctive and navigational queries over OWL 2 EL knowledge bases, Journal of Artificial Intelligence Research (2014), 645–705. doi:10.1613/jair.4457.

51.

F.M.

Suchanek,

Kasneci and

Weikum, Yago: A core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8–12, 2007,

C.L.

Williamson,

M.E.

Zurko,

P.F.

Patel-Schneider and

P.J.

Shenoy, eds, ACM, 2007, pp. 697–706. doi:10.1145/1242572.1242667.

52.

Surinx,

G.H.L.

Fletcher,

Gyssens,

Leinders,

J.V.

den Bussche,

D.V.

Gucht,

Vansummeren and

Wu, Relative expressive power of navigational querying on graphs using transitive closure, Logic Journal of the IGPL23(5) (2015), 759–788. doi:10.1093/jigpal/jzv028.

53.

Vrandečić and

Krötzsch, Wikidata: A free collaborative knowledgebase, Communications of the ACM57(10) (2014), 78–85. doi:10.1145/2629489.

54.

P.T.

Wood, Query Languages for Graph Databases, SIGMOD Record41(1) (2012), 50–60. doi:10.1145/2206869.2206879.

55.

Yakovets,

Godfrey and

Gryz, Query planning for evaluating SPARQL property paths, in: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26–July 01, 2016,

Özcan,

Koutrika and

Madden, eds, ACM, 2016, pp. 1875–1889. doi:10.1145/2882903.2882944.

56.

Zauner,

Linse,

Furche and

Bry, A RPL through RDF: Expressive Navigation in RDF Graphs, in: Web Reasoning and Rule Systems – Fourth International Conference, Proceedings, RR 2010, Bressanone/Brixen, Italy, September 22–24, 2010,

Hitzler and

Lukasiewicz, eds, Springer, 2010, pp. 251–257. doi:10.1007/978-3-642-15918-3_25.

57.

Zhang and

Van den Bussche, On the power of SPARQL in expressing navigational queries, The Computer Journal58(11) (2015), 2841–2851. doi:10.1093/comjnl/bxu128.

1.	Subclass:	(a) $\frac{(A, sc, B) (X, type, A)}{(X, type, B)}$ ; (b) $\frac{(A, sc, B) (B, sc, C)}{(A, sc, C)}$
2.	Subproperty:	(a) $\frac{(A, sp, B) (X, A, Y)}{(X, B, Y)}$ ; (b) $\frac{(A, sp, B) (B, sp, C)}{(A, sp, C)}$
3.	Domain:	$\frac{(A, dom, B) (X, A, Y)}{(X, type, B)}$
4.	Range:	$\frac{(A, range, B) (X, A, Y)}{(Y, type, B)}$

Querying knowledge graphs with extended property paths

Abstract

Keywords

1. Introduction

Example 1 (Path difference).

1 We provide a detailed algorithm in Section 7.

Example 3 (Tests).

Example 4 (Path Conjunction, Difference and Tests).

Example 7 (Path Repetitions).

Example 8 (EPPs within SPARQL).

2 https://extendedpps.wordpress.com

3 To simplify the discussion we do not consider blank nodes in this section; we will address this issue later in Section 2.4.

2.2. SPARQL patterns

4 We assume that any triple pattern contains at least one variable.

2.4. SPARQL property paths

Table 2 Syntax of EPPs

3. Extended property paths

3.1. Extended property paths syntax

Definition 12 (Extended property path pattern).

5 http://www.w3.org/TR/sparql11-query/#rExpression

Table 3 EPPs SPARQL-based semantics. The function E T handles tests. projects the element in position of a triple t ∈ G . Moreover, u ∈ I ; ? v L , ? v R ∈ V and ? v n ∈ V is a fresh variable. evaluate is a function that checks if the triple t satisfies EExp

4. Translation of nEPPs into SPARQL

4.1. Translation algorithm: An overview

4.1.1. Operational tree

9 Note that the and syntactic operators are omitted since they are only syntactic sugar and can be rewritten by using and .

4.1.3. Generating SPARQL code

5.1. Capturing the entailment regime

Table 8 Encoding of ρdf inference rules via EPPs

6.1. Expressive power of extended property paths vs. property paths

10 Even if such result could be obtained by adapting standard results about NREs, we provide, for the sake of completeness, a complete constructive proof.

7.1. Formal semantics of EPPs based on sets

7.2. Evaluation algorithm

8.1. Translation running time

12 http://km.aifb.kit.edu/projects/btc-2012

9.1. SPARQL navigational extensions

16 http://marmotta.apache.org

9.3. CONSTRUCT query forms

10. Concluding remarks

Footnotes

Detailed experiments in Section 8.3

Queries of experiments in Section 8.2

References

¹
We provide a detailed algorithm in Section 7.

²
https://extendedpps.wordpress.com

³
To simplify the discussion we do not consider blank nodes in this section; we will address this issue later in Section 2.4.

⁴
We assume that any triple pattern contains at least one variable.

Table 2
Syntax of EPPs

⁵
http://www.w3.org/TR/sparql11-query/#rExpression

Table 3
EPPs SPARQL-based semantics. The function $E_{T}$ handles tests. projects the element in position of a triple $t \in G$ . Moreover, $u \in I$ ; $? v_{L}, ? v_{R} \in V$ and $? v_{n} \in V$ is a fresh variable. evaluate is a function that checks if the triple t satisfies EExp

⁹
Note that the and syntactic operators are omitted since they are only syntactic sugar and can be rewritten by using and .

Table 8
Encoding of ρdf inference rules via EPPs

¹⁰
Even if such result could be obtained by adapting standard results about NREs, we provide, for the sake of completeness, a complete constructive proof.

¹²
http://km.aifb.kit.edu/projects/btc-2012

¹⁶
http://marmotta.apache.org