Sage Journals: Discover world-class research

Abstract

Large volumes of geospatial data are being published on the Semantic Web (SW), yielding a need for advanced analysis of such data. However, existing SW technologies only support advanced analytical concepts such as multidimensional (MD) data warehouses and Online Analytical Processing (OLAP) over non-spatial SW data. To remedy this need, this paper presents the QB4SOLAP vocabulary, which supports spatially enhanced MD data cubes over RDF data. The paper also defines a number of Spatial OLAP (SOLAP) operators over QB4SOLAP cubes and provides algorithms for generating spatially extended SPARQL queries from the SOLAP operators. The proposals are validated by applying them to a realistic use case.

Keywords

Spatial OLAP spatial data multidimensional data data modelling RDF SPARQL

1. Introduction

The Semantic Web (SW) has evolved, from focusing mostly on data publishing to also support increasingly complex queries such as interactive analytical queries. Simultaneously, the data available on the SW has evolved from being simple, most alphanumeric data, to also include complex data such as spatial data. Indeed, geospatial data is now common on the SW, but it remains difficult to analyze it.

In a non-SW context, the main tools for interactive data analyses have been Data Warehouses (DWs) and Online Analytical Processing (OLAP) tools and queries. DWs store large volumes of data and are designed with a multidimensional (MD) modeling approach, which has shown itself to be intuitive for interactive data analytics. Concretely, DWs consist of MD data cubes. The cells of the cube represent the topic of analysis, and associate observation facts with numerical measures that can be aggregated. For example, a sales fact cube has measures such as QuantitySold and SalesPrice. Facts are linked to dimensions, which provide contextual information, e.g., sales date, product, and location. Dimensions are perspectives, which are used to analyze data, and are organized into hierarchies with levels, e.g., Store, City, and Region, that allow users to analyze and aggregate measures at different levels of detail. Levels have a set of attributes that describe the characteristics of the level members. In traditional DWs, the location dimension is widely used, but as a conventional dimension with alphanumeric data and thus only nominal reference to spatial concepts such as areas and places. This does not allow manipulating through spatial location data or deriving topological relations among the hierarchy levels of the location dimension. This yields a demand for truly spatial DWs for better analysis purposes. Including the geometric information of the location data, significantly improves the analysis process (i.e., proximity analysis of the locations) with additional perspectives by revealing dynamic spatial hierarchy levels and new spatial members.

Fig. 1.

QB4SOLAP approach to SOLAP on the SW.

Similarly, providing deep spatial analytics support for spatial SW data is very valuable. Spatial data requires specific treatment techniques, in particular encoding, special functions and different manipulation methods, which should be considered in the modeling process and querying. The current state of the art for the geospatial Semantic Web focuses on techniques for publishing, linking and querying spatial data, but supports only “plain” spatial SW data (without support for spatial DW concepts such as spatial hierarchies, levels, and measures) and does not consider analytical queries over spatial RDF data (see Section 2 for details).

Problem Definition. The proliferation of open geospatial data on the SW creates possibilities for advanced analysis of such data. Many examples exist of spatial Linked Open Data (LOD) published on the SW as RDF.1

EuroStat: http://ec.europa.eu/eurostat.

^,2

UK Environmental Data: http://environment.data.gov.uk.

^,3

Danish Agricultural Data: https://datahub.io/dataset/govagribus-denmark.

^,4

⁴

Australian Climate Observations: https://datahub.io/dataset/acorn-sat.

These datasets have observations and measures that are well suited for analytical queries (e.g., water/air quality measurements, immigration rates, EU subsidies in agriculture, crop revenue, etc.). However, such datasets are typically not modeled with spatial dimension levels and hierarchies. Thus, they cannot be queried with interactive spatial analytical queries (a.k.a. SOLAP) on the SW. In the current state of the SW, if a (spatial) DW user would like to query the existing spatial RDF data from the SW with SOLAP operations, the user needs to download the RDF data, map it to a relational data model (i.e., with a snowflake schema), and then import it into a traditional spatial data warehouse in order to query with SOLAP, which is slow, labor-intensive, and stores the data in a non-open format.

Our Approach. On the contrary, annotating spatial RDF datasets with QB4SOLAP [16,17] allows users to define spatial multidimensional concepts on top of existing RDF data. Hence, the user can create and publish spatial data warehouses on the Semantic Web, which can be easily queried with SOLAP operations. Figure 1 depicts the general workflow scenario, where the spatial RDF datasets from endpoints can be annotated with QB4SOLAP. This makes it possible for end users to use SOLAP queries. However, writing a SOLAP query in SPARQL can be very complicated for users inexperienced with SPARQL (e.g., traditional DW users). Due to the lack of MD semantics of spatial RDF data and the lack of translation techniques from high-level SOLAP expressions to SPARQL, there is a considerable entry barrier for advanced spatial data analysis on the SW for data warehouse users.

Contributions. In order to address these issues, this paper makes a number of contributions. First, we propose QB4SOLAP, a generic and extensible vocabulary (metamodel) for spatial DWs on the SW. QB4SOLAP extends the most recent stable version of the QB4OLAP vocabulary with spatial concepts. We provide a full formalization of QB4SOLAP. The key concepts of spatial cube members, spatial hierarchies and levels, spatial measures, spatial aggregate functions (e.g., union, buffer, and convex–hull) and topological relations among spatial dimension and hierarchy level members (e.g., within, intersects, and overlaps), are defined. Second, we define a number of analytical Spatial OLAP (SOLAP) operators over the model including giving formal semantics of the operators. The operators support advanced analytical queries over MD geospatial SW data. Third, we provide algorithms for generating spatially extended SPARQL queries for individual and nested SOLAP operators, which allows writing SOLAP queries without knowledge of RDF/SPARQL. Fourth, we validate the vocabulary, operators, and query generation algorithms by applying them to a realistic use case.

Paper structure. The remainder of the paper is structured as follows. Section 2 discusses related work. Section 3 defines preliminary spatial and OLAP concepts. Section 4 defines the QB4SOLAP vocabulary, while Section 5 defines the SOLAP operators. Section 6 provides the SPARQL query generation algorithms. Finally, Section 7 concludes the paper and points to future research.

2. Related work

DW and OLAP technologies have been successful for analyzing large volumes of data [1]. Combining DW/OLAP technologies with RDF data makes RDF data sources more easily available for interactive analysis. The following work concerns the integration of DW/OLAP with the SW.

DW/OLAP and Semantic Web. Using OLAP to analyze SW data is considered in several approaches. Kämpgen et al. propose an extended model [23] on top of the RDF Data Cube Vocabulary (QB) [5] for interacting with statistical linked data via OLAP operations directly in SPARQL. However, it has the inherent limitations of QB and thus cannot support OLAP dimensions with hierarchies and levels, and built-in aggregate functions. Etcheverry et al. introduce QB4OLAP [11] as an extended vocabulary based on QB, with a full MD metamodel, supporting OLAP operations directly over RDF data with SPARQL queries. Nath et al. considers creating an Extract–Transform–Load (ETL) framework for semantic data warehouses [8]. Varga et al. presents a comprehensive methodology for dimensional enrichment of statistical LOD by using QB4OLAP and provide a SW-based OLAP engine for traditional DW users [37]. However, these approaches and vocabularies support neither spatial DWs nor provide SOLAP operators for the SW.

Spatial DW and OLAP. The constraint representation of spatial data has been the focus in many fields from databases to AI [32]. Extending OLAP with spatial features has also attracted the attention of the data warehousing community. Bédard et al. first introduced the term SOLAP [4] in 1997. SOLAP systems [9,33] since then, have significantly been improved. Respectively, various papers improve the spatial aggregation functions and techniques [6,15,29,38].

Several conceptual models are proposed for representing spatial data in data warehouses. Stefanovic et al. [19] considers constructing and materializing spatial cubes in their proposed model. The MultiDim conceptual model, introduced by Malinowski and Zimányi [27], copes with spatial features and is extended in [36], to include complex geometric features (continuous fields), with a set of operations and an MD calculus supporting spatial data types. Gómez et al. [14] propose an algebra and a general framework for OLAP cube analysis on discrete and continuous spatial data. Even though spatial data warehousing is thus widely studied, those studies are limited to traditional non-semantic spatial data warehouses and SOLAP techniques. The work above neither considered semantic web data nor spatial analytical querying in SPARQL.

Geospatial Semantic Web. The Open Geospatial Consortium (OGC) has proposed GeoSPARQL [3] as a vocabulary to represent and query spatial data in RDF using an extension to SPARQL. Kyzirakos et al. present a comprehensive survey of data models and query languages for linked geospatial data in [25], and propose a semantic geospatial data store called Strabon in [24]. Strabon has an extensive query language called stSPARQL, which is however limited to the specific environment. LinkedGeoData is a significant contribution on interactive transformation of OpenStreetMap5

⁵
http://www.openstreetmap.org

data to RDF data [35]. GeoKnow [26] is a more recent project with focus on linking geospatial data from heterogeneous sources. Andersen et al. considers publishing/converting open spatial data as Linked Open Data [2]. However, none of these works consider the MD aspects of geospatial data or allow querying with SOLAP on the SW, unlike QB4SOLAP. The QB4SOLAP vocabulary is validated with both the running example use case, the GeoNorthwind data cube, as well as a substantial real-world use case, the GeoFarmHerdState data cube [17]. GeoFarmHerdState is a spatial data cube about livestock holdings in Denmark, which integrates environmental and geographical open data from several sources, thus enabling a range of interesting SOLAP queries.

In summary, none of the related work, which is surveyed in the fields of “DW/OLAP and the SW”, “Spatial DW and OLAP”, and “Geospatial Semantic Web” provides a substantial foundation for modeling and querying spatial data warehouses on the Semantic Web, unlike the QB4SOLAP vocabulary, SOLAP operators, and SPARQL generation algorithms presented in this paper.

3. Preliminary concepts

In this section, we describe the spatial objects and the spatial operations that manipulate them. Then, we introduce the data cubes and spatial enhancement on them as spatial data cubes. Finally, we show the traditional OLAP operations, which manipulate data cubes, and explain the Spatial OLAP (SOLAP) operators, which manipulate spatial data cubes.

3.1. Spatial objects

A spatial object represents a real-world object whose geographic features are important for an application. These geographic features are encoded using the geometry data type. Point, Line, and Polygon are the basic instantiable types of the geometry data type. Coordinates for geometry data type are generally given in 2-dimensions with X, Y values. Geometries are associated with a spatial reference system (SRS), which describes the coordinate space in which the geometry is defined. There are several SRSs and each of them are identified with a spatial reference system identifier (SRID). The World Geodetic System (WGS) is the most well-known SRS and the latest version is called WGS84, which is also used in our use case.

3.2. Spatial operations

There is a set of spatial operations that can be applied on spatial data. We grouped these operations into classes, based on the common functionality of the operators. These classes are defined next.

Definition 1 (Spatial aggregation).

The operators in the spatial aggregation class $S_{agg}$ aggregate two or more spatial objects and return a new spatial object. Union, Intersection, ConvexHull, and MinimumBoundingRectangle (MBR) are example operators of this class. Some spatial functions such as ConvexHull or MBR can also be interpreted as unary spatial functions with a single parameter, but here we only consider the aggregate versions of the functions. In order to make this clear, the aggregate versions of those functions are given with a prefix “Aggr” in the QB4SOLAP vocabulary (Fig. 6). For our purpose, it is enough to group all spatial aggregate functions into a single group, although more fine-grained classification proposals for spatial aggregate functions exist [6].

Definition 2 (Topological relations).

The operators in the topological relation class $T_{rel}$ are commonly expressed in the RCC86

⁶
RCC8 (Region Connection Calculus) describes regions in Euclidean space or in a topological space by their possible relations to each other.

and DE-9DIM7

⁷

DE-9DIM (Dimensionally Extended Nine-Intersection Model) is a topological model that describes spatial relations of two geometries in two dimensions.

models [10 ,31]. Topological relations are Boolean predicates that specify how two spatial objects are related to each other. Examples of topological relations are Intersects, Disjoint, Equals, Overlaps, Contains, Within, Touches, Covers, CoveredBy, and Crosses.

Definition 3 (Numeric operations).

The operators in the numeric operation class $N_{op}$ take one or more spatial objects and return a numeric value. Perimeter, Area, NoOfInteriorRings, Distance, HaversineDistance, and NoOfGeometries are example operators of this class.

3.3. Data cubes

Data warehouses store large volumes of data for decision support. They are based on the multidimensional model, which views data in an n-dimensional space, usually called a data cube. The cells of the cube represent the observation facts for analysis with a set of attributes called measures (e.g., a sales fact cube with measures product quantity and price). Facts are linked to dimensions, which provide perspectives to analyze data (e.g., sales date, product, and customer location). Dimensions are organized into hierarchies, which allow users to aggregate measures at various levels of detail. Hierarchies are composed of levels and there is always a unique top level All with just one member all. Levels have a set of attributes that describe the characteristics of the level members.

An example of a data cube with three dimensions (Customer, Time, and Product) and one measure (Quantity) is given in Fig. 2. Each cell in the cube is an observation fact, which is characterized by dimension and measure values. The hierarchies of this cube are given in Fig. 4(a)–(c). Thus, in the cube shown in Fig. 2, the Product dimension is given at the Category level, the Time dimension at the Quarter level, and the Customer dimension at the City level. Measure values represent the measure Quantity of the sold products.

Fig. 2.

A three-dimensional cube for Sales data.

Fig. 3.

Roll-up to the Country level.

Fig. 4.

Dimension hierarchies.

3.4. Spatial data cubes

A spatial data cube contains both conventional and spatial dimensions. A spatial dimension is a dimension, which includes at least one spatial level in which the application should store the spatial characteristics of the members. Similarly, a hierarchy is a spatial hierarchy if it has at least one spatial level. Spatial characteristics of the levels are captured by their geometries and can be recorded in the spatial attributes of the level. A spatial fact is a fact that relates several dimensions in which, two or more are spatial.

For example, consider a Sales spatial fact, which has spatial dimensions Customer and Supplier, each with a spatial hierarchy Geography composed of spatial levels City → State → Country → All (Fig. 4(c)). These spatial levels record the spatial characteristics of its members with spatial attributes: Customer, Supplier, and City using a point spatial data type, whereas State and Country with a multi-polygon spatial data type.

Following the philosophy of spatially extended MultiDim conceptual model [36], MD concepts such as levels are considered to be spatial, only if they record the spatial characteristics of the concepts as geometries. For instance, “continent” might be considered as a spatial object, in theory or in other vocabularies. However, if there is no information about the geometry of the continents in the schema and in the instance data, continent does not become a spatial level (Extension 7), although continent might still be a traditional level (Definition 7) of the spatial hierarchy Geography with alphanumeric attributes (i.e., continent name, code, and etc.).

Spatial data cubes typically have spatial measures, which are also represented by a spatial data type. An example is a SalesPoint measure that stores the location of sales. Figure 7 shows the multidimensional schema of the GeoNorthwind data warehouse, which is used as running example in the paper.

3.5. OLAP operators

OLAP operators are used for expressing queries over data cubes. The traditional OLAP operators are given next.

The slice operator removes a dimension from a cube by selecting one instance in a dimension level. An example is “slice on City is equal to Odense”.

The dice operator selects the cells in a cube that satisfy a Boolean condition. An example is “dice on the first and last quarter of the year”.

The roll-up operator aggregates measures along a hierarchy to obtain data at a coarser granularity. An example is “roll-up to the Country level” (Fig. 3).

Finally, the drill-down operator disaggregates measures along a hierarchy to obtain data at a finer granularity. It is the inverse operation of roll-up. Starting from the cube in Fig. 3, an example is “drill-down to the City level”.

3.6. Spatial OLAP operators

Spatial OLAP (SOLAP) operates on spatial data cubes. SOLAP increases the analytical capabilities of OLAP by taking into account the spatial information in the cube. SOLAP operators involve spatial conditions or spatial functions by using the spatial operators defined in Section 3.1. Spatial conditions specify constraints on the geometries associated to cube members or measures, while spatial functions derive new data from the cube, which can be used, e.g., to derive dynamic spatial hierarchies or levels, as explained in the following example. Spatial extensions of the common OLAP operators are formally defined in Section 5.

Table 1
Sample (instance) data for the Sales cube

Customer city Supplier Total sales

Customer s1 s2 s3

Düsseldorf c1 8 – 3 11

c2 10 – – 10

Dortmund c3 7 4 – 11

c4 – 20 3 23

Münster c5 – – 30 30

Customer city	Supplier	Total sales
Düsseldorf	c1	8	–	3	11
c2	10	–	–	10
Dortmund	c3	7	4	–	11
c4	–	20	3	23
Münster	c5	–	–	30	30

Fig. 5.

Example map of Sales (instance) data.

Example 1.

Consider the summarized data for the Sales cube given in Table 1, where a ‘–’ is used if there are no sales to customers from the corresponding suppliers. The data in Table 1 is shown on the map in Fig. 5, where the arrows on the map between the supplier and customer locations represent the distance. The quantities of sold products are shown along these arrows.

The hierarchies in Fig. 4(a)–(c) can be used to perform classical roll-up operations, where measures are aggregated from a child to a parent level. An example of such a roll-up operator is expressed by the query “total sales to customers by city”, whose results is given in Table 2.

On the other hand, as shown in Table 4 and Fig. 5, some customers may be closer to suppliers from other cities. For example, customer c3 is related to its city Dortmund by using traditional Geography hierarchy, but the customer is closer to the city Düsseldorf of supplier s1. Similarly, customer c4 in city Dortmund is closer to the city Münster of supplier s3. Figure 4(d) shows a new dynamic spatial hierarchy that can be obtained with a spatial roll-up (s-roll-up) operator that expresses the query “total sales to customers by city of the closest supplier”. Such queries are not possible to express on conventional hierarchies with traditional OLAP.

The hierarchy in Fig. 4(d) is created on the fly with the help of a spatial function computing the distance between customer and supplier locations. Therefore, using the s-roll-up operator, sales to customers are aggregated by city of the closest suppliers, where Dortmund has a significant drop off in the quantity of the sales from 34 (Table 2) to 20 (Table 3).

Table 2

Roll-up of the Sales cube

Customer city	Sales
Düsseldorf	21
Dortmund	34
Münster	30

Table 3

S-Roll-up of the Sales cube

City closest supplier	Sales
Düsseldorf	25
Dortmund	20
Münster	33

Table 4

Customer to Supplier distance

		Supplier city

		Supplier

Customer city		Düsseldorf	Dortmund	Münster

Customer		s1	s2	s3
Düsseldorf	c1	15 km	45 km	30 km
Düsseldorf	c2	15 km	60 km	60 km
Dortmund	c3	15 km	30 km	45 km
Dortmund	c4	45 km	15 km	15 km
Münster	c5	60 km	45 km	15 km

4. The QB4SOLAP vocabulary

In this section, we formally define how to represent (spatial) data cubes in RDF. We use as running example the GeoNorthwind data warehouse whose conceptual schema is given in Fig. 7.

Fig. 6.

The QB4SOLAP vocabulary.

The QB4OLAP [11] vocabulary allows to define cube schemas and cube instances as RDF triples. QB4OLAP is an extension of the RDF Data Cube Vocabulary (QB) [5] with multidimensional concepts in order to be able to support OLAP operations directly over RDF data with SPARQL queries. We extended QB4OLAP (v1.2)8

⁸

QB4OLAP v1.2: https://github.com/lorenae/qb4olap/blob/master/rdf/qb4olap.1.2.ttl.

with spatial concepts to give QB4SOLAP [16]. We based our extension on GeoSPARQL [30], a standard from the Open Geospatial Consortium (OGC) for representing and querying geospatial linked data for the Semantic Web. Since our base vocabulary QB4OLAP uses the MultiDim conceptual model to describe the multidimensional concepts, we base our definitions on a spatially extended version of MultiDim conceptual model [36] for spatial extension of the MD concepts. Fig. 6 shows the QB4SOLAP vocabulary for representing a spatial cube schema and spatial cube members as RDF triples. A cube schema defines the structure of the cube in terms of dimension levels, measures, aggregation functions (e.g., SUM, AVG, COUNT) on measures, spatial aggregation functions (

S_{agg}

in Definition 1) on spatial measures, dimensions hierarchies, and parent–child relationships between levels (including their cardinality and topological relationships for spatial levels). These schema level metadata are used to define multidimensional datasets in RDF. Cube members are the instances of a cube schema that represent level members, facts, and measure values. As we will show in Section 6, we use the schema level metadata to produce SPARQL queries that implement SOLAP operators on cube members.

Terms with capitalized initials and non-italic font in Fig. 6 represent RDF classes, terms with capitalized initials and italic font represent RDF instances, and terms with non-capitalized initials represent RDF properties. Classes in external vocabularies are depicted in light gray background and font. RDF Cube (QB), QB4OLAP, and QB4SOLAP classes are shown, respectively, with white, light gray, dark gray backgrounds. Original QB terms are prefixed with qb:.9

⁹

RDF cube: http://purl.org/linked-data/cube#.

QB4OLAP and QB4SOLAP terms are prefixed, respectively, with qb4o:10

¹⁰

QB4OLAP: http://purl.org/qb4olap/cubes#.

and qb4so:.11

¹¹

QB4SOLAP: http://w3id.org/qb4solap#.

Spatial classes and properties are prefixed with geo:.12

¹²

GeoSPARQL:http://www.opengis.net/ont/geosparql#.

In what follows, we first define formally RDF triples, and then discuss how to describe (spatial) multidimensional data using QB4OLAP and QB4SOLAP.

Definition 4 (RDF triple).

An RDF triple $t = (s, p, o)$ consists of three components: s is the subject, p is the predicate, and o is the object. RDF triples are defined over $\begin{matrix} T = (I \cup B) \times I \times (I \cup B \cup L) \end{matrix}$ where $I$ is the set of IRIs, $B$ is the set of blank nodes, and $L$ is the set of literals.

A set of RDF triples is referred to as a graph. We denote a QB4SOLAP graph by $G$ , where $G \subset T$ . The cube schema and cube instances are subsets of this graph and are denoted, respectively, by $G^{S}$ and $G^{I}$ , where $G^{S} \subset G$ and $G^{I} \subset G$ .

Given an MD element $x \in (I \cup B)$ in a schema graph or instance graph $G$ , we define by $G_{x}$ the subgraph of $G$ for x, where $G_{x} \subset G$ . We define the function $id (x) : G \to I$ , that given a MD element x returns its identifier $I$ from the graph $G$ . We use superscript notation to indicate the type of the identifier from the cube schema graph ( $G^{S}$ ) and cube instance graph ( $G^{I}$ ), e.g., ${id}^{S} (x)$ for a cube schema identifier and ${id}^{I} (x)$ for a cube instance identifier.

4.1. Defining spatial data cube schemas with QB4SOLAP

An n-dimensional cube schema $CS$ is a tuple $CS = (D, M, F)$ , with a set of dimensions D, a set of measures M, and a fact F. A dimension $d \in D$ has a set of hierarchies $H (d)$ . Each hierarchy $h \in H (d)$ is organized into a set of levels $L (h)$ . Each level $l \in L (h)$ has a set of attributes $A (l)$ . Each attribute $a \in A (l)$ is defined over a domain. Each measure $m \in M$ is also defined over a domain.

We define next how to represent a cube schema $CS$ in RDF using QB4SOLAP. We denote the RDF graph of the cube schema $G^{S}$ . In the examples we prefix the elements of $G^{S}$ with gnw:. We follow a similar naming convention for schema elements as in QB4OLAP. If there is a possibility of confusion for different MD concepts with same schema name, i.e., customer dimension and customer level, we suffix the dimensions with Dim (e.g., gnw:customerDim for dimension, and gnw:customer for level). The subgraph of $G^{S}$ that refers to a specific schema element x is denoted by $G_{x}^{S}$ and the unique identifier of x is denoted by ${id}^{S} (x)$ .

Definition 5 (Dimensions).

An n-dimensional cube schema $CS$ has a set of dimensions $D = {d_{1}, \dots, d_{n}}$ and each dimension $d_{i}$ has a set of hierarchies $H (d_{i})$ (Definition 6). Each dimension $d_{i} \in D$ is defined in the cube schema graph $G^{S}$ with qb:Dimension Property. Each hierarchy $h \in H (d_{i})$ is linked to its dimension $d_{i}$ with the qb4o:hasHierarchy property. The RDF graph formulation of the dimensions D is represented as $\begin{matrix} G_{D}^{S} = ⋃_{i = 1}^{n} G_{d_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{d_{i}}^{S} \\ = {({id}^{S} (d_{i}) rdf:type qb: DimensionProperty)} \\ \cup ⋃_{h \in H (d_{i})} {({id}^{S} (d_{i}) qb4o: hasHierarchy {id}^{S} (h))} \end{array}$

Extension 5 (Spatial dimensions).

A dimension is spatial if it has at least one spatial level. A spatial dimension $d_{i_{s}}$ belongs to the set of spatial dimensions $D_{s}$ , which is a subset of the set of dimensions D, such that $d_{i_{s}} \in D_{s} \subseteq D$ .

Fig. 7.

Conceptual multidimensional schema of the GeoNorthwind data warehouse.

Example 2.

The triples below show how some of the dimensions of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 5 and Extension 5. As we will see below, the Customer and Supplier dimensions are spatial as they both have a spatial hierarchy Geography.

Definition 6 (Hierarchies).

A dimension d has a set of hierarchies $H (d) = {h_{1}, \dots, h_{m}}$ , where each hierarchy $h_{i}$ has a set of levels $L (h_{i})$ (Definition 7). Each hierarchy $h_{i} \in H (d)$ is defined in the cube schema graph $G^{S}$ with the qb4o:Hierarchy predicate and is linked with its dimension d by the qb4o:inDimension property. Each level $l \in L (h_{i})$ that belongs to a hierarchy $h_{i}$ is defined with the qb4o:hasLevel property. The RDF graph formulation of the hierarchies $H (d)$ is represented as $\begin{matrix} G_{H (d)}^{S} = ⋃_{i = 1}^{m} G_{h_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{h_{i}}^{S} = & {({id}^{S} (h_{i}) rdf:type qb4o: Hierarchy)} \\ \cup {({id}^{S} (h_{i}) qb4o:inDimension {id}^{S} (d))} \\ \cup ⋃_{l \in L (h_{i})} {({id}^{S} (h_{i}) qb4o: hasLevel {id}^{S} (l))} \end{array}$

Extension 6 (Spatial hierarchies).

A hierarchy is spatial if it contains at least one spatial level. A spatial hierarchy $h_{i_{s}}$ belongs to the set of spatial hierarchies $H_{s} (d)$ , which is a subset of the set of hierarchies $H (d)$ , such that $h_{i_{s}} \in H_{s} (d) \subseteq H (d)$ .

Example 3.
The triples below show how some of the hierarchies of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 6 and Extension 6. As we will see below, the Geography hierarchies in the Customer and Supplier dimensions are spatial since they have spatial levels (City, State, etc.)

Definition 7 (Levels).

A hierarchy h has a set of levels $L (h) = {l_{1}, \dots, l_{k}}$ and each level $l_{i}$ has a set of attributes $A (l_{i})$ (Definition 8). Each level $l_{i} \in L (h)$ is defined in the cube schema graph $G^{S}$ with the qb4o:LevelProperty predicate. Each attribute $a \in A (l_{i})$ is linked to its level $l_{i}$ with the qb4o:hasAttribute property. The RDF graph formulation of the levels $L (h)$ is represented as $\begin{matrix} G_{L (h)}^{S} = ⋃_{i = 1}^{k} G_{l_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{l_{i}}^{S} = & {({id}^{S} (l_{i}) rdf:type qb4o: LevelProperty)} \\ \cup ⋃_{a \in A (l_{i})} {({id}^{S} (l_{i}) qb4o: hasAttribute {id}^{S} (a))} \end{array}$

Extension 7 (Spatial levels).

A level is spatial if it has an associated geometry. A spatial level $l_{i_{s}}$ belongs to the set of spatial levels $L_{s} (h)$ , which is a subset of the set of levels $L (h)$ , such that $l_{i_{s}} \in L_{s} (h) \subseteq L (h)$ . The geometry of a spatial level is defined in the cube schema graph $G^{S}$ with the geo:hasGeometry property The RDF graph formulation of the spatial levels $L_{s} (h)$ is represented as $\begin{matrix} G_{L_{s} (h)}^{S} = ⋃_{i = 1}^{k} G_{l_{i_{s}}}^{S} \end{matrix}$ where $\begin{array}{l} G_{l_{i_{s}}}^{S} = & {({id}^{S} (l_{i_{s}}) rdf:type qb4o: LevelProperty)} \\ \cup {({id}^{S} (l_{i_{s}}) geo:hasGeometry geo: Geometry)} \\ \cup ⋃_{a \in A (l_{i_{s}})} {({id}^{S} (l_{i_{s}}) qb4o: hasAttribute {id}^{S} (a))} \end{array}$

Example 4.
The triples below show how some of the levels of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 7 and Extension 7. Note that the Customer and City levels are spatial as they have a geometry that is specified at the level definition.

Definition 8 (Attributes).

A level l has a set of attributes $A (l) = {a_{1}, \dots, a_{p}}$ , which defines the characteristics of the level members. One among these attribute, denoted as $a_{ID}$ , specifies a surrogate key for the level, i.e., the value of $a_{ID}$ uniquely identifies the members of the level. For simplicity, we assume that it is the first attribute in the set of attributes $A (l)$ , i.e., $a_{1} = a_{ID}$ . Each attribute $a_{i} \in A (l)$ is defined in the cube schema graph $G^{S}$ with the qb4o:LevelAttribute predicate and is linked to its level l with the qb4o:inLevel property. Each attribute $a_{i}$ is defined as ranging over XSD literals $L$ using the rdfs:range property. The RDF graph formulation of the attributes $A (l)$ is represented as $\begin{matrix} G_{A (l)}^{S} = ⋃_{i = 1}^{p} G_{a_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{a_{i}}^{S} \\ = {({id}^{S} (a_{i}) rdf:type qb4o: LevelAttribute)} \\ \cup {({id}^{S} (a_{i}) qb4o: inLevel {id}^{S} (l))} \\ \cup {({id}^{S} (a_{i}) rdfs: range L)} \end{array}$

Extension 8 (Spatial attributes).

An attribute is spatial if it is defined over a spatial domain. A spatial attribute $a_{i_{s}}$ belongs to the set of spatial attributes $A_{s} (l)$ , which is a subset of the set of attributes $A (l)$ , such that $a_{i_{s}} \in A_{s} (l) \subseteq A (l)$ . The RDF graph formulation of the spatial attributes is similar as in Definition 8. However, the attribute must range over spatial literals $L_{s}$ i.e., a well-known text literal (WKT) from OGC schemas. Further, the domain of the attribute should be specified with the rdfs:domain property, which must be a geometry. Finally, the attribute must be specified as spatial object with the rdfs:subclassOf property. The RDF graph formulation of the spatial attributes $A_{s} (l)$ is represented as $\begin{matrix} G_{A_{s} (l)}^{S} = ⋃_{i = 1}^{p} G_{a_{i_{s}}}^{S} \end{matrix}$ where $\begin{array}{l} G_{a_{i_{s}}}^{S} \\ = {({id}^{S} (a_{i_{s}}) rdf:type qb4o: LevelAttribute)} \\ \cup {({id}^{S} (a_{i_{s}}) qb4o: inLevel {id}^{S} (l))} \\ \cup {({id}^{S} (a_{i_{s}}) rdfs: range L_{s})} \\ \cup {({id}^{S} (a_{i_{s}}) rdfs:subPropertyOf geo:Geometry)} \\ \cup {({id}^{S} (a_{i_{s}}) rdfs:subClassOf \\ geo:SpatialObject)} \end{array}$

Example 5.
The triples below show how some of the attributes of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 8 and Extension 8. Note that the Customer level has a spatial attribute (Customer geometry). It is represented as a WKT literal that defines a Point type from the Geometry class, which is a subclass of Spatial Object.

Definition 9 (Hierarchy steps).

A hierarchy h has a set of hierarchy steps $HS (h) = {h s_{1}, \dots, h s_{q}}$ , which define the structure of the hierarchy in relation with its corresponding levels. A hierarchy step $h s_{i} = (l_{c}, l_{p}, card) \in HS (h)$ entails a roll-up relation between a lower (child) level $l_{c}$ to an upper (parent) level $l_{p}$ with a cardinality $card$ . The cardinality $card \in {1 - 1, 1 - n, n - 1, n - n}$ describes the number of members in one level that can be related to a member in the other level for both the child and the parent levels.

Each hierarchy step $h s_{i}$ is defined in the cube schema graph $G^{S}$ as a blank node _:hs $_{i}$ $\in B$ with the qb4o:HierarchyStep predicate. Each hierarchy step is linked to its hierarchy with the qb4o:inHierarchy property. The child and parent levels are linked in a hierarchy step with the qb4o:childLevel and qb4o:parentLevel properties, respectively. The cardinality $card$ of a hierarchy step is defined by the qb4o:pcCardinality property. The RDF graph formulation of the hierarchy steps $HS (h)$ is represented as $\begin{matrix} G_{HS (h)}^{S} = ⋃_{i = 1}^{q} G_{h s_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{h s_{i}}^{S} \\ = {({_:hs}_{i} rdf:type qb4o:HierarchyStep)} \\ \cup {({_:hs}_{i} qb4o:inHierarchy {id}^{S} (h))} \\ \cup {({_:hs}_{i} qb4o:parentLevel {id}^{S} (l_{p}))} \\ \cup {({_:hs}_{i} qb4o:childLevel {id}^{S} (l_{c}))} \\ \cup {({_:hs}_{i} qb4o:pcCardinality {id}^{S} (card))} \end{array}$

Extension 9 (Spatial hierarchy steps).

A hierarchy step is spatial if it relates a spatial child level $l_{c_{s}}$ and a spatial parent level $l_{p_{s}}$ , in which case it entails a topological relationships between these spatial levels. A spatial hierarchy step is then a tuple $h s_{i_{s}} = (l_{c_{s}}, l_{p_{s}}, card, topoRel)$ where the topological relation $topoRel$ belongs to the $T_{rel}$ class (Definition 2). The topological relation between parent-child levels of a spatial hierarchy step is defined by the qb4so:pcTopoRel property. The RDF graph formulation of the spatial hierarchy steps ${HS}_{s} (h)$ (w.r.t. Definition 9) is represented as $\begin{matrix} G_{{HS}_{s} (h)}^{S} = ⋃_{i = 1}^{q} G_{h s_{i_{s}}}^{S} \end{matrix}$ where $\begin{array}{l} G_{h s_{i_{s}}}^{S} = & G_{h s_{i}}^{S} \\ \cup {({_:hs}_{i} qb4so:pcTopoRel {id}^{S} (topoRel))} \end{array}$

Example 6.
The triples below show how the hierarchy steps of the Geography spatial hierarchy in the Customer dimension of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 9 and Extension 9. Note that all hierarchy steps are spatial and have an associated topological relation.

Definition 10 (Partial order on levels).

The hierarchy steps $HS (h)$ of a hierarchy h define a partial order on the levels $l \in L (h)$ . The reflexive and transitive closure of the partial order is denoted as ⊑, with a unique base level ( $l_{b}$ ) and a unique top level ( $All$ ), where all levels l are such that $l_{b} ⊑ l$ , and $l ⊑ All$ .

Definition 11 (Measures).

An n-dimensional cube schema has a set of measures $M = {m_{1}, \dots, m_{r}}$ , which record the values of a phenomena being observed. Each measure $m_{i} \in M$ is defined in the cube schema graph $G^{S}$ with the qb:MeasureProperty predicate. Similarly to attributes, each measure $m_{i}$ is defined as ranging over XSD literals $L$ with the rdfs:range property. The RDF graph formulation of the measures M is represented as $\begin{matrix} G_{M}^{S} = ⋃_{i = 1}^{r} G_{m_{i}}^{S} \end{matrix}$ where $\begin{array}{l} G_{m_{i}}^{S} \\ = {({id}^{S} (m_{i}) rdf:type qb: MeasureProperty)} \\ \cup {({id}^{S} (m_{i}) rdfs: range L)} \end{array}$

Extension 11 (Spatial measures).

A measure is spatial if it is defined over a spatial domain as in spatial attributes (Extension 8). A spatial measure $m_{i_{s}}$ belongs to the set of spatial measures $M_{s}$ , which is a subset of the set of measures M, such that $m_{i_{s}} \in M_{s} \subseteq M$ . The RDF formulation of the spatial measures is similar as in Definition 11. However, the domain should range over spatial literals $L_{s}$ . The RDF graph formulation of the spatial measures $M_{s}$ (w.r.t. Definition 11) is represented as $\begin{matrix} G_{M_{s}}^{S} = ⋃_{i = 1}^{r} G_{m_{i_{s}}}^{S} \end{matrix}$ where $\begin{array}{l} G_{m_{i_{s}}}^{S} \\ = {({id}^{S} (m_{i_{s}}) rdf:type qb: MeasureProperty)} \\ \cup {({id}^{S} (m_{i_{s}}) rdfs: range L_{s})} \cup {({id}^{S} (m_{i_{s}}) \\ rdfs:subClassOf geo:SpatialObject)} \end{array}$

Example 7.
The triples below show how the measures of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 11 and Extension 11. Note that SalesPoint is a spatial measure, which records the location of the stores in which the sales occurred. It is defined over Geometry domain as a Point type with WKT literal.

Definition 12 (Fact).

In an n-dimensional cube schema $CS = (D, M, F)$ , the fact F defines the structure of a cube with the qb:DataStructureDefinition property. The dimensions are given as components of the fact and are defined with the qb4o:level property. We assume that the fact F links the dimensions at the lowest granularity level, therefore qb4o:level links the lowest (base) level $l_{b}$ of each dimension $d_{i}$ , which is denoted as $l_{b} (d_{i})$ . The cardinality $card$ of the relationship between a dimension level and a fact is represented with the qb4o:cardinality property. Similarly, the measures are given as components of the fact and are defined with the qb:measure property. The aggregate function $aggr$ associated to each measure is represented with the qb4o:aggregateFunction property. The RDF graph formulation of the fact F is given in the following equation. $\begin{array}{l} G_{F}^{S} = & {({id}^{S} (F) \\ rdf:type qb: DataStructureDefinition)} \\ \cup ⋃_{d_{i} \in D} {({id}^{S} (F) qb:component \\ [qb4o:level {id}^{S} (l_{b} (d_{i})); \\ qb4o:cardinality {id}^{S} (card)])} \\ \cup ⋃_{m_{i} \in M} {({id}^{S} (F) qb:component \\ [qb:measure {id}^{S} (m_{i}); \\ qb4o:aggregateFunction {id}^{S} (aggr)])} \end{array}$

Extension 12 (Spatial fact).

A fact is spatial if it relates several levels, where two or more are spatial. A spatial fact may also have a topological relation $topoRel$ that must be satisfied by the related spatial levels, which is represented with qb4so: topologicalRelation. This object property allows to specify a topological relation in fact-level relationship of spatial facts. The RDF graph formulation of such a fact is simply by adding the property of fact-level topological relation consecutively to the cardinality property as given in the following equation. $\begin{array}{l} G_{F_{s}}^{S} = & {({id}^{S} (F_{s}) \\ rdf:type qb: DataStructureDefinition)} \\ \cup ⋃_{d_{i} \in D} {({id}^{S} (F_{s}) qb:component \\ [qb4o:level {id}^{S} (l_{b} (d_{i})); \\ qb4o:cardinality {id}^{S} (card); \\ qb4so:topologicalRelation {id}^{S} (topoRel)]} \\ \cup ⋃_{m_{i} \in M} {({id}^{S} (F) qb:component \\ [qb:measure {id}^{S} (m_{i}); \\ qb4o:aggregateFunction {id}^{S} (aggr)])} \end{array}$

Example 8.
The triples below show how the fact of the GeoNorthwind DW (Fig. 7) is represented in RDF using Definition 12. Sales fact does not impose any topological relation between its spatial dimensions Supplier and Customer. SalesPoint is a spatial measure, which has a spatial aggregate function (AggrConvexHull).

4.2. Defining spatial data cube members with QB4SOLAP

We have explained in Section 4.1 how a data cube schema can be represented in RDF with QB4SOLAP. We show next how to use this schema to represent the instances of the GeoNorthwind DW (Fig. 7) in RDF. We denote by $G^{I}$ the RDF graph of the data cube instances. In the examples, we prefix the elements of $G^{I}$ with gnwi:. The subgraph of $G^{I}$ that refers to a specific cube instance x is denoted by $G_{x}^{I}$ and the unique identifier of x is denoted by ${id}^{I} (x)$ .

Definition 13 (Level members).

A level l has a set of level members $LM (l) = {{lm}_{1}, \dots, {lm}_{y}}$ . Each level member ${lm}_{i}$ has a unique IRI ${id}^{I} ({lm}_{i}) \in I$ , which is linked in the cube instance graph $G^{I}$ with the qb4o:LevelMember predicate. A level member is related to its level by the qb4o:memberOf property. The RDF graph formulation of the level members $LM (l)$ is represented as $\begin{matrix} G_{LM (l)}^{I} = ⋃_{i = 1}^{y} G_{{lm}_{i}}^{I} \end{matrix}$ where $\begin{array}{l} G_{{lm}_{i}}^{I} \\ = {({id}^{I} ({lm}_{i}) rdf:type qb4o: LevelMember)} \\ \cup {({id}^{I} ({lm}_{i}) qb4o:memberOf {id}^{S} (l))} \end{array}$

Definition 14 (Attributes of level members).

A level member $lm$ has a set of attributes $A (lm) = {a_{1}, \dots, a_{p}}$ , which are used to describe the characteristics of the level member (Definition 8). Each attribute $a_{i}$ is linked to the level member with the identifier ${id}^{S} (a_{i})$ . We denote by $lm ⇝ v_{a_{i}}$ the value $v_{a_{i}}$ that a level member $lm$ associates to attribute $a_{i}$ . This value is given as a literal $L$ such that $v_{a_{i}} \in L$ . The RDF graph formulation of the attributes $A (lm)$ is represented as $\begin{matrix} G_{A (lm)}^{I} = ⋃_{i = 1}^{p} G_{a_{i}}^{I} \end{matrix}$ where $\begin{array}{l} G_{a_{i}}^{I} = {({id}^{I} (lm) {id}^{S} (a_{i}) v_{a_{i}}) ∣ lm ⇝ v_{a_{i}}} \end{array}$

Definition 15 (Partial order on level members).

A hierarchy step $h s = (l_{c}, l_{p}, card)$ between a child level $l_{c}$ and a parent level $l_{p}$ defines a set of roll-up relations $RU (h s) = {r_{1}, \dots, r_{k}}$ where each $r_{i} = {lm}_{c_{i}} ⊑ {lm}_{p_{i}}$ relates a child level member ${lm}_{c_{i}} \in LM (l_{c})$ to a parent level member ${lm}_{p_{i}} \in LM (l_{p})$ . These roll-up relations define a partial order between level members with regard to Definition 10 and are expressed using the property skos:broader. The RDF graph formulation of the roll-up relations $RU (h s)$ is represented as $\begin{matrix} G_{RU (h s)}^{I} = ⋃_{i = 1}^{k} G_{r_{i}}^{I} \end{matrix}$ where $\begin{array}{l} G_{r_{i}}^{I} = & {({id}^{I} ({lm}_{c}) skos:broader {id}^{I} ({lm}_{p})) ∣ \\ r_{i} = {lm}_{c_{i}} ⊑ {lm}_{p_{i}}} \end{array}$

Example 9.
The triples below show how some level members of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definitions 13–14.

Definition 16 (Fact members).

A fact F has a set of fact members $FM (F) = {f_{1}, \dots, f_{t}}$ , which are the instances of the data cube. Each fact $f_{i} \in FM$ has a unique IRI ${id}^{I} (f_{i}) \in I$ , which is linked in the cube instance graph $G^{I}$ with the qb:Observation predicate.

A fact member $f_{i}$ is related to a set of dimension levels $L (f_{i}) = {l_{1}, \dots, l_{r}}$ and has a set of measures $M (f_{i}) = {m_{1}, \dots, m_{s}}$ . Each dimension level $l_{j}$ is linked to the level member with the identifier ${id}^{S} (l_{j})$ and each measure $m_{k}$ is linked to the level member with the identifier ${id}^{S} (m_{k})$ . We denote by $f ⇝ v_{l_{j}}$ and $f ⇝ v_{m_{k}}$ , respectively, the dimension values and measure values associated with a fact f. The value $v_{l_{j}} \in I$ is the identifier of a level member in $LM (l_{j})$ . Further, the value $v_{m_{k}}$ for every measure $m_{k}$ is a literal such that $v_{m_{k}} \in L$ . The RDF graph formulation of the fact members $FM (F)$ is represented as $\begin{matrix} G_{FM (F)}^{I} = ⋃_{i = 1}^{t} G_{f_{i}}^{I} \end{matrix}$ where $\begin{array}{l} G_{f_{i}}^{I} = & {({id}^{I} (f_{i}) rdf:type qb:Observation)} \\ \cup ⋃_{l_{j} \in L (f_{i})} {({id}^{I} (f_{i}) {id}^{S} (l_{j}) {id}^{I} (v_{l_{j}}) ∣ f_{i} ⇝ v_{l_{j}})} \\ \cup ⋃_{m_{k} \in M (f_{i})} {({id}^{I} (f_{i}) {id}^{S} (m_{k}) {id}^{I} (v_{m_{k}}) ∣ f_{i} ⇝ v_{m_{k}})} \end{array}$

Example 10.
The triples below show how a fact member of the GeoNorthwind DW (Fig. 7) is represented in RDF using Definitions 12–16. Note that the fact member and corresponding level members relating to dimensions are given with the prefix gnwi:. ${id}^{S} (a_{ID})$ is the surrogate key (Definition 8) that links the fact member to the corresponding dimensions’ base level members.

5. Semantics of SOLAP operators

This section defines a formal algebra for SOLAP operators. Examples of the operators are provided after their definitions. The complete SPARQL query examples are given at our website13

¹³
http://extbi.cs.aau.dk/SOLAP4SW/queries

and can be tested at our public endpoint.14

¹⁴

http://extbi.lab.aau.dk/sparql

The query runtimes for each SOLAP operator are given in Appendix A.1, Table 5 for the use case dataset GeoNorthwind (Section 4.1, Fig. 7). These operators can be applied on spatially enhanced multidimensional data cubes (Section 3.3). The presentation defines the semantics of a SOLAP operator by logically specifying the typical OLAP operators with spatial functions and conditions. Spatial functions and conditions can be selected from a range of operation classes, which can be applied on spatial data types (Section 3.1). Let

S

be the set of any spatial operators where

S = (S_{agg} \cup T_{rel} \cup N_{op})

, used to represent a spatial predicate

ϕ^{S} \in S

or a spatial function

f^{S} \in S

, which is in a SOLAP operator. The following SOLAP operators are defined with a spatial extension to the well-known OLAP operators, which are given in the remarks.

Remark 17 (Slice).

The slice operator removes a dimension from a cube $C$ by selecting one instance in a dimension level. For example, the query “slice on customers in the city of Odense” is a slice operation. (Cube is the sales, dimension is the customer, level in dimension is the city and the value is Odense, which is sliced out from the cube.)

Definition 17 (S-Slice).

The s-slice operator removes a dimension from a cube $C$ by choosing a single spatial attribute value $v_{s} \in L_{s}$ (Extension 8) in a spatial level $l_{s}$ (Extension 7).

As for the semantics, s-slice takes an n-dimensional cube $C$ as an argument. We assume that the cube has the cube schema $CS = (D, M, F)$ , with the fact members $f \in FM$ as given in Definition 16. As parameters, s-slice takes a spatial literal value $v_{s}$ , the base level $l_{b}$ and the target (spatial) level $l_{s}$ of a dimension $d_{i}$ . The base level $l_{b}$ specifies the dimension $d_{i}$ (Definition 16). The target spatial level $l_{s}$ is the level, that the spatial literal value $v_{s}$ is related.

The operator is defined as: $SS (C) [l_{b}, l_{s}, v_{s}] = C^{'}$ , which returns a cube $C^{'}$ with $n - 1$ dimensions and the schema $CS = (D^{'}, M^{'}, F^{'})$ , where $D^{'} = D ∖ {d_{i}}$ , $M^{'} = M$ , and $F^{'} = F$ . The measures M and the fact type F remains the same though the new cube $C^{'}$ has one dimension less.

The s-slice operator selects a subset ${FM}^{'}$ from the set of fact members $FM$ ( ${FM}^{'} \subseteq FM$ ), with respect to the given parameter $v_{s}$ . Assuming that the granularity of the fact members are at the (lowest) base level of the dimension $l_{b} \in L (d_{i})$ in the given cube, a partial order exists among the levels, from bottom level to the target spatial level $l_{s}$ such that $l_{b} ⊑ l_{s}$ . The given parameter $v_{s}$ is related to a level member of the level $l_{s}$ . We say that the fact members are characterized by dimension values, which is written as $f ⇝ v_{d_{i}}$ where $v_{d_{i}} \equiv v_{l_{b (d_{i})}}$ (Definition 16). In other words, dimensions are associated to the fact members by the values of the dimensions’ base level members $v_{l_{b (d_{i})}}$ . When the dimension $d_{i}$ is clear in the context, we will use base level $v_{l_{b}}$ for simplicity reasons.

To sum up, the subset ${FM}^{'}$ of facts is selected with regards to the partial order on levels from base level $l_{b}$ to the target level $l_{s}$ . The value $v_{l_{s}}$ in the target level $l_{s}$ is specified with respect to the given spatial literal value $v_{s}$ . The value of $v_{s}$ might be equal to a spatial attribute value in the target level $l_{s}$ , thus $v_{l_{s}}$ is characterized by the attribute value $v_{s}$ and written as $v_{l_{s}} ⇝ v_{s}$ (Example 11). Or, $v_{s}$ is an arbitrary spatial literal that entails a topological relation $T_{rel}$ (i.e., within) in a value of the target spatial level $v_{l_{s}}$ , which is written as $\exists v_{l_{s}} : ϕ^{S} (v_{s})$ where $ϕ^{S}$ is a spatial Boolean predicate that represents a topological relation (Example 12). After applying the s-slice operator on cube $C$ , the new (sub)set of fact members is defined for both cases respectively as follows; ${FM}^{'} = {f \in FM ∣ \exists v_{l_{b}} \in LM (l_{b}), v_{l_{s}} \in LM (l_{s}) : f ⇝ v_{l_{b}} \land v_{l_{b}} ⊑ v_{l_{s}} \land v_{l_{s}} ⇝ v_{s}}$ , ${FM}^{'} = {f \in FM ∣ \exists v_{l_{b}} \in LM (l_{b}), v_{l_{s}} \in LM (l_{s}) : f ⇝ v_{l_{b}} \land v_{l_{b}} ⊑ v_{l_{s}} : ϕ^{S} (v_{s})}$ .

Example 11.
With regards to the traditional slice query “slice on customers in the city of Odense”, in s-slice, the user could specify a geometry extent (e.g., polygon coordinates of the city of Odense) as spatial literal for slicing instead of giving a text literal (e.g., “Odense”). So the s-slice query would be; “slice on customers of the city, which has the geometry "POLYGON((10.43951 55.47006, 10.439472 55.470036, 10.439240 (...))"”. More intuitively, instead of the specified spatial literal $v_{s} \in L_{s}$ , the user can pass a function call as parameter to s-slice, e.g., by querying “slice on customers in the largest city of (southern) Denmark by land area”. The function call should calculate the area of the cities by their geometries where the largest city is selected as a requirement of the s-slice operator. Both cases are given in the following. Example 11.1.
The following SPARQL query shows an s-slice operator, which filters with the given spatial literal by the user.
Example 11.2.
The following SPARQL query shows the s-slice operator, which filters with the function call (largest city) returned from inner select. Given the current limitations of SPARQL, there is not an area calculation function from the geometries of the spatial objects during query run time, however we give the query with a notional built-in bif:st_area function.

Example 12.
With regards to the traditional slice query “slice on customers in the city of Odense”, in this example of s-slice, the user gives a point geometry (i.e., $X, Y$ coordinates of a point as spatial literal) and filter at the given level (i.e., City level) that the given point is within. So the s-slice query would be; “slice on customers of the city, in which the given "POINT(10.43951 55.47006)" is within”.

The following SPARQL query shows an s-slice operator, which filters at the specified level with the given spatial literal by the user.

Note that the s-slice can be operated in different ways based on the geometry given to the query. In both Examples 11 and 12, slice level is given as City, however in Example 12 a random $X, Y$ point is given that is falling into the target city. Therefore we need to use within from topological relationships ( $T_{rel}$ ) class in order to verify and filter that city.
Remark 18 (Dice).

The traditional dice operator takes a cube and a Boolean condition ϕ, which returns a new cube containing only the cells that satisfy the Boolean condition ϕ. Dice operation is analogous to relational algebra, R selection; $σ_{ϕ} (R)$ , but the argument is a cube not a relation. For example, the query “sales to customers of type LLC (Limited Liability Company)” is a dice operation. (Cube is the sales, dimension is the customer, and Boolean condition is the customer type if they are LLC.)

Definition 18 (S-Dice).

Similarly, the s-dice operator takes an n-dimensional cube $C$ as an argument, which has the cube schema $CS = (D, M, F)$ with the fact members $f \in FM$ as given in Definition 16. As a parameter s-dice takes a spatial Boolean predicate, which is denoted by $ϕ^{S}$ . The s-dice operator keeps the cells of the cube $C$ that satisfies the spatial predicate over spatial dimension levels $l_{s}$ , attributes $a_{s}$ , and measures m.

The semantics of the operator is defined as: $SD (C) [ϕ^{S}] = C^{'}$ where spatial predicate $ϕ^{S}$ can be applied on spatial level member values $ϕ^{S} (v_{l_{s}})$ , spatial attribute values $ϕ^{S} (v_{a_{s}})$ , measure values $ϕ^{S} (v_{m})$ and/or a combination of these.

$SD$ operator returns a sub cube $C^{'} \subseteq C$ , which has the schema $CS = (D^{'}, M^{'}, F^{'})$ where $D^{'} = D$ , $M^{'} = M$ , and $F^{'} = F$ . Unlike the s-slice operator, s-dice keeps all the dimensions D in the output cube $C^{'}$ . The set of measures M and the fact type F also remains the same, though the new cube $C^{'}$ is a subset of the original cube $C$ with filtered fact members $f \in {FM}^{'}$ , which is explained in the following.

The s-dice operator selects a subset ${FM}^{'}$ of the fact members’ set ${FM}^{'} \subseteq FM$ with respect to the spatial predicate $ϕ^{S}$ on level members as follows;

Spatial predicate on level values: ${FM}^{'} = {f \in FM ∣ \exists v_{l_{b}} \in LM (l_{b}), v_{l_{s}} \in LM (l_{s}) : f ⇝ v_{l_{b}} \land v_{l_{b}} ⊑ v_{l_{s}} \land ϕ^{S} (v_{l_{s}})}$ .

Spatial predicate on level attribute values: ${FM}^{'} = {f \in FM ∣ \exists v_{l_{b}} \in LM (l_{b}), v_{l_{s}} \in LM (l_{s}) \land v_{l_{s}} ⇝ v_{a_{s}} : f ⇝ v_{l_{b}} \land v_{l_{b}} ⊑ v_{l_{s}} \land v_{l_{s}} ⇝ v_{a_{s}} \land ϕ^{S} (v_{a_{s}})}$ .

Note that the filtering the facts through level members can be done by $v_{l_{s}}$ (level values) or attribute values $v_{a_{s}}$ by applying the spatial predicate $ϕ^{S}$ . Finally filtering of the facts is on associated measure values is defined in the following;

Spatial predicate on measure values of $m_{s}$ : ${FM}^{'} = {f \in FM ∣ \exists v_{m_{s}} \in Codomain (m_{s}) : f ⇝ v_{m_{s}} \land ϕ^{S} (v_{m_{s}})}$ .

For complex cases, i.e., combining these three types; the result set is also followed by combining the basic result sets.

Example 13.
The s-dice operator can be implemented on level and attribute values by filtering level members in the cube or on measures by filtering the facts in the cube. In both cases the spatial predicate $ϕ^{S}$ is used.

The query for the s-dice operator could be “sales to customers, which are located within 5 km distance from their city center” where the s-dice is on level members by filtering the customer level. The spatial predicate $ϕ^{S}$ can be interpreted in two different ways (See Appendix A.1 for comparison of their query run times). Example 13.1.
First method is assuming a buffer area of 5 km from the coordinates of city center and checking customers’ locations by within operator from topological relations $ϕ^{S} \in T_{rel}$ if it meets the condition. The following SPARQL query shows the implementation of this method on level members.
Example 13.2.
Second method is checking if the distance from a customer location to the corresponding city center is less than 5 km, by using distance function from numeric operations $f^{S} \in N_{op}$ . In this case the spatial predicate $ϕ^{S}$ is a combination of a spatial function $f^{S}$ and a regular Boolean predicate ϕ. Spatial function is distance from numeric operations and the predicate is less than (<). The following SPARQL query shows the implementation of this method for s-dice on level members.

Remark 19 (Roll-up).

The traditional roll-up operator aggregates measures according to a dimension hierarchy (by using an aggregate function), in order to obtain measures at a coarser granularity for a given dimension. For example, the query “total amount of sales to customers by city” is a classical roll-up operation. (Cube is the sales, dimension is the customer, level in dimension to roll-up is the city such that $customer ⊑ city$ , measure is the sales amount and aggregate function is the sum in order to calculate the total sales.)

Definition 19 (S-Roll-up).

Similarly to roll-up operator, s-roll-up aggregates measures $m \in M$ of a given cube $C$ , by using an aggregate function and a spatial function $f^{S} \in S$ (Section 3.1) along a spatial dimension’s hierarchy $h_{s}$ (Extension 6), which should have spatial levels $l_{s}$ (Extension 7). However, in s-roll-up the dimension hierarchy is created dynamically on levels by the spatial function $f^{S}$ . We call this hierarchy a dynamic spatial hierarchy, conceptually from a base level $l_{b}$ to the dynamically created target level $l_{s}^{'}$ such that $l_{b} ⊑_{d} l_{s}^{'}$ . The instances of the target level $l_{s}^{'}$ are obtained by the spatial function $f^{S} (l_{s})$ that is applied on spatial dimension levels.

As for the semantics, s-roll-up takes an n-dimensional cube $C$ as an argument, which has the cube schema $CS = (D, M, F)$ with the fact members $f \in FM$ as given in Definition 16. As a parameter s-roll-up takes a spatial function $f^{S} \in S$ to operate on levels $L (d_{i})$ and an aggregate function $agg$ to calculate a measure m at the higher target level. For simplicity of explanation and without loss of generality, we initially assume that there is only one measure m. The extension of the operator on several measures $m \in M$ is explained in the last paragraph. S-Roll-up operator is formulated as; $SRU (C) [f^{S} (L (d_{i})), agg (m)] = C^{'}$ , which returns a cube $C^{'}$ with n-dimensions and has the schema $CS = (D^{'}, M^{'}, F^{'})$ where $F^{'} = F$ , $M^{'} = M$ , and $D^{'} = {d_{i} \in D ∣ {d_{1}, \dots, d_{i}^{'}, \dots, d_{n}} \land L^{'} (d_{i}^{'}) = L (d_{i}) ∖ (l_{b} ⊑_{d} \dots ⊏_{d} l_{s}) \cup {l_{s}^{'}}}$ . After the s-roll-up operation, number of dimensions in D remains the same, although the base levels and levels below the target level $(l_{b} ⊑_{d} \dots ⊏_{d} l_{s})$ of the corresponding dimension $d_{i}$ are left out and a new target level $l_{s}^{'}$ is added to the set of dimension levels $L^{'} (d_{i}^{'})$ of $d_{i}^{'}$ .

The set of level members of the level $l_{s}^{'}$ is selected with respect to the spatial function on base level members of a spatial dimension such that $LM (l_{s}^{'}) = {f^{S} (v_{l_{b}}) ∣ v_{l_{b}} \in LM (l_{b})}$ where $l_{b} ⊑_{d} l_{s}^{'} ⟺ f^{S} (v_{l_{b}}) = v_{l_{s}^{'}}$ , which means that the base level $l_{b}$ rolls up along the spatial dynamic hierarchy ( $⊑_{d}$ ) to the target new spatial level $l_{s}^{'}$ if and only if spatial function on base level $f^{S} (v_{l_{b}}) = v_{l_{s}^{'}}$ produces the new spatial level members $v_{l_{s}^{'}}$ . Even though the set of measures M remains the same, the s-roll-up operator obtains the measure values associated with fact members $f^{'}$ at a coarser granularity $l_{s}^{'}$ , which alters the set of facts ${FM}^{'} ⊈ FM$ . In order to create the new set of facts ${FM}^{'}$ at the new granularity level $l_{s}^{'}$ , the $Group$ operator [28] is used to group the facts characterized by the same level members $v_{l_{s}^{'}} \in LM (l_{s}^{'})$ such that $Group (v_{l_{s}^{'}}) = {f \in FM ∣ \exists v_{l_{b}} \in LM (l_{b}) : f ⇝ v_{l_{b}} \land v_{l_{b}} ⊑_{d} v_{l_{s}^{'}}}$ . The output of the $Group$ operator on level members is a new fact instance $f^{'}$ . In order to aggregate the measure values $v_{m}$ , which are associated with the fact members f we use an aggregate function $agg$ such that $agg ({f_{1}, \dots, f_{k}}) = agg (v_{m_{1}}, \dots, v_{m_{k}})$ where $f_{i} ⇝ v_{m_{i}}, i = 1, \dots, k$ . Finally, the set of the new facts $f^{'} \in {FM}^{'}$ is constructed, that is given with the associated new level members and aggregated measure values as; ${FM}^{'} = {f^{'} = Group (v_{l_{s}^{'}}) ∣ \exists v_{l_{s}^{'}} \in LM (l_{s}^{'}) : f^{'} ⇝ v_{l_{s}^{'}} \land f^{'} ⇝ agg (Group (v_{l_{s}^{'}}))}$ .

The extension to multiple measures is similar, which is done by providing and using a separate aggregate function for each measure $m \in M$ .

Example 14.
The following SPARQL query shows the s-roll-up operator, which is exemplified in Section 3.6. The query is “total amount of sales to customers by city of the closest suppliers”. Note that the measures are aggregated up to a new city from customer level of the customer dimension, which is specified as the Closest City. The hierarchy step from customer to city is defined dynamically by a spatial function $f^{S}$ (distance from numeric operations $N_{op} \subset S$ ), which is then used in a wrapper function to find the closest distance of the suppliers and customers. The levels and level members (of customer), which are below the newly defined level (Closest City) are left out in the result.

Remark 20 (Drill-down).

Drill-down is the inverse operator of roll-up, which disaggregates previously summarized data to a child level in order to obtain measures at a finer granularity of a given dimension. For example, the roll-up query given in Remark 19 (“total amount of sales to customers by city”) aggregates sales by summing up the sales amount, from customer level to city level along a hierarchy. As drill-down operator performs the operation opposite to the roll-up an example would be; “average amount of sales of each supplier, drilled down from the city level to the supplier level”. (Cube is the same as sales, and the hierarchy is the same but the dimension is the supplier, so child level in dimension to drill-down from city level is the supplier such that $City ⊒ Supplier$ .) Conceptually, a drill-down to level $l_{i}$ on a cube $C$ corresponds to a roll-up to the same level $l_{i}$ on the base cube of $C$ , that is denoted as $BaseCube (C)$ .

Definition 20 (S-Drill-down).

Analogously to drill-down operator, s-drill-down disaggregates measures $m \in M$ of a given cube $C$ , by using an aggregate function and a spatial function $f^{S}$ (Section 3.1) along a spatial dimension’s hierarchy $h_{s}$ (Extension 6), which should have spatial levels (i.e., $l_{s}$ ) (Extension 7).

Conceptually, in s-drill-down, the dimension hierarchy is created dynamically on levels by the spatial function $f^{S}$ as in s-roll-up. This is similar to the dynamic spatial hierarchy defined in Definition 19, that is from a spatial parent level $l_{p_{s}}$ to a dynamically created spatial child level $l_{c_{s}}^{'}$ such that $l_{p_{s}} ⊒_{d} l_{c_{s}}^{'}$ . The target spatial child level $l_{c_{s}}^{'}$ is the output of the spatial function $f^{S}$ on spatial levels $l_{i_{s}} \in L (d_{i})$ of the spatial dimension. Applying s-drill-down to child level $l_{c_{s}}^{'}$ from a parent level $l_{p_{s}}$ on a cube $C$ corresponds to applying s-roll-up to the same level $l_{c_{s}}^{'}$ from the base level $l_{b}$ on the base cube of $C$ . Therefore, the semantics of the s-drill-down is described same as s-roll-up and the operator is formulated as $SDD (C) [f^{S} (L (d_{i})), agg (m)] = SRU (BaseCube (C)) [f^{S} (L (d_{i})), agg (m)]$ .

Example 15.
In order to exemplify an s-drill-down, starting from the result cube graph of Example 14 (“total amount of sales to customers by city of the closest supplier”), which is at the granularity of City level, we drill down to child level Supplier with the query “average amount of sales of furthest suppliers to their city center, drilled down the from City level to Supplier level”. The following SPARQL query shows the given example.

In this paper, we focus on direct querying of single data cubes with main SOLAP operators in SPARQL. The integration of several cubes through s-drill-across or set-oriented operations such as union, intersection, and difference [7] is out of scope and remained as future work.
6. Generating SOLAP queries in SPARQL via QB4SOLAP

After having defined the high-level SOLAP operators in Section 5, this section first describes how to generate SPARQL queries for each of these operators by using the QB4SOLAP metamodel (Section 4). Afterwards, this section describes how to create more complex SPARQL queries for nested SOLAP operations.

6.1. Generation algorithms

The generated SPARQL queries Q are of the form “ $Q = SELECT R WHERE GP$ ”, where $GP$ is a graph pattern containing triple patterns and R is the (set of) variable(s) that are returned in the result of the query. Triple patterns are based on triples of the form $(s, p, o)$ (Definition 4), where triple components are replaced by variables. A set of triple patterns defines a graph pattern $GP$ . Given an RDF graph $G$ , a graph pattern $GP$ is used to search for subgraphs $G_{(R)} \subseteq G$ matching the pattern. In our algorithm, the graph pattern is initially empty, $GP = \emptyset$ , and the triple patterns are added incrementally to the body of the WHERE clause: $GP = GP \cup (s p o)$ .

RDF datasets published with the QB4SOLAP vocabulary use the skos:broader property to define the roll-up relation from child level to parent level (Definitions 13 and 15). As this is the case for all hierarchy levels in a dimension, every OLAP query contains such roll-up paths that we need to consider as part of $GP$ in the WHERE clause.

Thus, we define a helper function $RUPath$ (Algorithm 1) that we can use in the SOLAP query generation algorithms.

Algorithm 1

$RUPath (G_{(C)}^{S}, l_{b}, l_{s}, a_{ID}$ , ?a $_{s}$ , ?f) : $GP$

Build roll-up path ( $RUPath$ ). The helper function $RUPath$ returns a graph pattern that we can use in the body of the WHERE clause. The roll-up path pattern is created as a path-shaped join of triples with partial order (⊑, skos:broader) (Definitions 10 and 15). The triple pattern is of the form ${(s_{1} p_{1} o_{1}), (o_{1} p_{2} o_{2}), (o_{2} p_{2} o_{3}), \dots, (o_{n - 1} p_{2} o_{n})}$ where $s_{1}$ is the root of the graph pattern and corresponds to fact members f from the QB4SOLAP schema (Definition 16), $p_{1}$ is the predicate ${id}^{S} (a_{ID})$ that associates facts with level members $v_{l_{i}}$ ( $f ⇝ v_{l_{i}}$ , Definition 16), $o_{1}$ is the variable for the first level member that rolls up to its parent level $o_{2}$ such that $o_{1} ⊑ o_{2}$ and so on, and the $p_{2}$ predicate corresponds to the skos:broader property. The last variable in the path $o_{n}$ corresponds to the target level $l_{s}$ in order to represent the level member variables at the target level. The roll-up path starts at the fact instances f (Definition 16). Afterwards, the partial order on level members (Definition 15) from base level $l_{b}$ to target level $l_{s}$ is applied. Algorithm 1 sketches the helper function for building the roll-up path for dimensions; from facts to dimension levels with predicates and cube member IRIs defined in the cube schema.

In order to represent such varying parameters at the instance level such as fact members, level members, or parameter values given by the user, and to distinguish these parameters from other parameters in the algorithm, we represent such parameters using variable names with question marks.

We use a FILTER expression to restrict the output data by using a (spatial) Boolean predicate $ϕ^{S}$ . A FILTER expression is part of the WHERE clause in a SPARQL query. Therefore, it is added to the body of the WHERE clause in the graph pattern $GP$ as $GP = GP \cup (FILTER ϕ^{S})$ . In the cases where there is a spatial function $f^{S} (x)$ in the SOLAP operator, it is given in the BIND clause, which is technically a part of the WHERE clause and therefore added to the body of the WHERE clause in a graph pattern $GP$ as $GP = GP \cup (BIND f^{S} (x))$ . SPARQL 1.1 defines aggregate expressions,15

¹⁵

https://www.w3.org/TR/sparql11-query/#aggregates

such as SUM, MIN, MAX, AVG, etc.

We apply them on measure values or use them as wrappers in spatial functions. In the following, we often write AGG to represent them.

In the following, we present the SPARQL query generation algorithms for the SOLAP operators defined in Section 5. The algorithms take the input parameters and arguments of the SOLAP operator and return the a SPARQL query Q that can be executed.

S-Slice generator. To generate a SPARQL query for the s-slice operator $SS (C) [l_{b}, l_{s}, v_{s}]$ (Definition 17), we use Algorithm 2. Parameter $v_{s}$ is a spatial literal value $v_{s} \in L_{s}$ (i.e., POINT or POLYGON) that should be related to a spatial level $l_{s}$ (Extension 7). This means that $v_{s}$ is defined as a polygon geometry that corresponds to a spatial attribute value in the target level $l_{s}$ (Example 11) or $v_{s}$ is defined as a point geometry that is spatially contained in a spatial attribute value of the target level $l_{s}$ (Example 12). Note that in Example 11.1, the given spatial literal has the geometry data type polygon, which corresponds to a spatial level attribute $a_{s}$ (Extension 8) at a spatial level $l_{s}$ . Similarly, the spatial function call $f^{S} (x)$ in Example 11.2 returns a polygon that corresponds to a spatial level attribute $a_{s}$ .

Algorithm 2

$S - SliceGenerator (G_{(C)}^{I}, v_{s}, l_{b}, l_{s})$ $: Q$

On the other hand the given spatial literal in Example 12 has the geometry data type point, which corresponds to the spatial level $l_{s}$ via topological relations ( $T_{rel}$ ). We consider all these possibilities in the s-slice generator algorithm. We explained these in the following, where the steps are referencing the line numbers in Algorithm 2.

Get the path for dimension $d_{s}$ (e.g., Customer) from the observation facts f to the base level $l_{b}$ , and build path-shaped triple pattern paths from the dimension’s base level $l_{b}$ to the target spatial level $l_{s}$ (e.g., City level). Finally, get level attribute IRIs and variables for the spatial attributes $a_{s}$ (e.g., City geometry). All this is done by the RUPath function (Algorithm 1) that is used by the s-slice generator. The following shows an example result of this step that is added to $GP$ :

Check if the spatial literal $v_{s}$ is a point geometry type. If true, create a FILTER statement with a spatial Boolean predicate (Line 4) and go to the result (Line 12).

Build the FILTER statement based on the spatial literal $v_{s}$ and the spatial attribute $a_{s}$ (Example 12). As a result the following lines might be added to the $GP$ :

Check if $v_{s}$ is a function call $f^{S} (x)$ . If true (Example 11.2), construct an inner select query to compute the spatial function $f^{S} (x)$ , then go to the result (Line 12).

Call the RUPath function in order to link $a_{s}$ variables with the fact instances (this time for inner select query $Q^{'}$ ). This step creates a graph pattern ${GP}^{'}$ for inner select query $Q^{'}$ , for example:

Build a bind statement on $a_{s}$ variables for calculating spatial functions (e.g., compute areas). For example, the following lines might be added to graph pattern ${GP}^{'}$ :

Generate the inner select query $Q^{'}$ based on ${GP}^{'}$ generated in Lines 6 and 7. For example ( $Q^{'}$ finds the geometry of the largest city):

Build the filter statement with the output of the spatial function $f^{S} (x)$ , construct $GP$ (includes $Q^{'}$ ) for the outer query, and go to the result (Line 12). At this stage $GP$ is constructed in Lines 2 and 8. The following for the filter statement is added to the $GP$ :

If a spatial literal $v_{s} \in L$ is given as the parameter instead, build a filter statement that checks if $v_{s}$ is equal to the spatial attribute $a_{s}$ values, and go to the result (Line 12). For example, the following filter condition might be added to graph pattern $GP$ :

Finally, the algorithm generates query Q, which can be executed over the fact members ${FM}^{'}$ . In our running examples we obtain the following cases for the generated s-slice query Q.

S-Slice operator with a given spatial value as point data type: The following listing corresponds to the SPARQL output of the running example where the spatial value is given as POINT data type (Example 12) and filters the level attributes with a spatial predicate within a given level. The graph pattern $GP$ for the query is created in Lines 2 to 7.

S-Slice operator with a spatial function call: The following listing corresponds to the SPARQL output of the running example where the spatial value is returned from a function call (Example 11.2). The graph pattern ${GP}^{'}$ for the spatial function call is created in Lines 8 to 13. The graph pattern $GP$ for the whole query is created in Lines 2 to 12.

S-Slice operator with a given spatial value as polygon data type: The following listing corresponds to the SPARQL output of the running example where the spatial value is given as a POLYGON data type (Example 11.1) corresponding to a level attribute. The graph pattern $GP$ for the query is created in Lines 2 to 7.

S-Dice generator. To generate a SPARQL query for the s-dice operator, $SD (C) [ϕ^{S}]$ (Definition 18 – parameter $ϕ^{S}$ represents a spatial predicate), we follow the steps sketched in Algorithm 3. The algorithm takes parameter $ϕ^{S}$ as input, which corresponds to a spatial predicate that could represent a topological relation from the $T_{rel}$ set or a combination of a spatial function (a numeric operation from the $N_{op}$ set) and a regular predicate ϕ. For illustration, we use the example query that we have introduced in Section 5 for s-dice (Example 13):

“sales to customers, which are located 5 km distance from their city center”. In the following, we discuss the main steps of Algorithm 3 with the running example, where the steps are referencing the line numbers in Algorithm 3.

The algorithm runs through the levels from base level $l_{b}$ to the target spatial level $l_{s}$ , which are both given in the spatial Boolean predicate $ϕ^{S}$ .

Build the roll-up path for those levels using the helper function RUPath. Note that, when we apply the roll-up path to the target level, we can also link the level attributes for the target (spatial) level – as, for example, in the last line of the following listing. The output of function RUPath is added to graph pattern $GP$ :

Check if $ϕ^{S}$ is to be implemented as a spatial predicate from topological relations $T_{rel}$ as interpreted in Example 13.1.

Create a filter statement with a spatial predicate and the spatial level attribute $a_{s}$ , which is referenced in the roll-up path (Line 4). For our running example, the filter statement is applied on customers that are located within a buffer area of 5 km from their city centers. The spatial predicate st_within is used from the topological relations. The following lines are added to graph pattern $GP$ :

Check if $ϕ^{S}$ is to be implemented as a combination of a spatial function $f^{S} (x)$ and a regular predicate ϕ as interpreted in Example 13.2.

Algorithm 3

$S - DiceGenerator (G_{(C)}^{I}, ϕ^{S}) : Q$

Create a bind statement based on a spatial function (i.e., calculate st_distance between customers and city center) and a filter statement based on the assigned values with a regular predicate (i.e., less than 5 km). The following lines are added to graph pattern $GP$ :

Generate query Q for selecting the facts $f \in {FM}^{'}$ matching the incrementally created graph pattern $GP$ in the previous steps. In our running examples we obtain the following cases for the generated s-dice query Q.

S-Dice operator with $ϕ^{S}$ : The following listing is the SPARQL query generated for the running example (Example 13.1), where the spatial predicate is interpreted as a topological relation. The graph pattern $GP$ for the query is created in Lines 2 to 8.

S-Dice operator with $f^{S} (x)$ and a regular Boolean predicate ϕ: The following listing is the SPARQL query generated for the running example (Example 13.2), where the spatial predicate is interpreted as a combination of a spatial function and a regular predicate. The graph pattern $GP$ for the query is created in Lines 2 to 9.

S-Roll-up Generator. To generate a SPARQL query for the s-roll-up operator from a high-level SOLAP expression, $SRU (C) [f^{S} (L (d_{i})), agg (m)]$ (Definition 19), where parameter $f^{S} (L (d_{i}))$ denotes a spatial function on spatial level members and $agg (m)$ is an aggregate function on measures. For illustration, we use the query example for s-roll-up given in Section 5 for s-roll-up (Example 14): “total amount of sales to customers by city of the closest suppliers”. We follow the main steps sketched in Algorithm 4 in the following.

Build the roll-up path using helper function RUPath. In addition to the variables given in the RUPath function, we also need to consider measures and measure value variables (Line 3) since we aggregate the measures. A measure is specified in the following listing of the running example as gnw:salesAmount. The following lines are added to the graph pattern $GP$ :

Build inner select subquery to apply the spatial function $f^{S}$ on the spatial level members $L (d_{i})$ (i.e., Customer, Supplier). In the example, we will use this information to create a dynamic spatial hierarchy from the Customer to the City level.

Algorithm 4

$SRUGenerator (G_{(C)}^{I}, f^{S} (L (d_{s}))$ , $agg (m)) : Q$

Call RUPath for the inner select subquery to link the geometry attributes of base level members with different variables and create a graph pattern ${GP}^{'}$ for the inner select. The following lines are added to the graph pattern ${GP}^{'}$ :

Build the bind statement in order to calculate the spatial function $f^{S} (L (d_{s}))$ on spatial level members. For the running example the spatial function is st_ distance. The following lines are added to the graph pattern ${GP}^{'}$ :

Generate the inner select query $Q^{'}$ using graph pattern ${GP}^{'}$ (Lines 5 and 6). Select the corresponding level members (Customer level for the running example) and group them in a group by statement on the selected level members. Note that this is where the spatial function $f^{S} (L (d_{i}))$ is called with a wrapper expression (e.g., MIN, MAX, etc.) to find the closest distance. The following lines illustrate the inner select query $Q^{'}$ :

Build the filter statement for the whole query based on the output of the spatial function, which is calculated in the inner select subquery. Then, add the filter and inner select subquery to the main graph pattern ${GP}^{'}$ (Line 8). The filter statement for the running example is:

Note that in Line 9, the spatial target level $l_{s}$ (City) is altered to a dynamic spatial level $l_{s}^{'}$ since applying the spatial function creates a dynamic hierarchy.

Generate query Q for computing the facts $f \in {FM}^{'}$ based on graph pattern $GP$ created in the previous steps. The measures are also aggregated at the spatial target level (closest City, which is dynamically selected). The group by statement is applied on the fact members and target level members. In our running example we obtain the following case for the generated s-roll-up query Q.

S-Roll-up operator: The following listing shows the generated SPARQL query. Graph pattern

{GP}^{'}

for the inner select subquery is created in Lines 15 to 22 and the graph pattern

GP

for the whole query is created in Lines 3 to 24.

S-Drill-down Generator. The semantics of the s-drill-down operator are defined in the same way as for the s-roll-up operator with the condition that the input cube $C$ for s-roll-up is obtained using a function $BaseCube$ such that $SDD (C) [f^{S} (L (d_{i})), agg (m)] = SRU (BaseCube (C)) [f^{S} (L (d_{i})), agg (m)]$ (Definition 20). Therefore, no generator algorithm and steps are specified since an s-drill-down operator corresponds to a rewriting of an s-roll-up operator, which is obtained with a $Base$ function that calls the base cube graph in SRUGenerator as; $SDDGenerator = SRUGenerator (Base (G_{(C)}^{I}), f^{S} (L (d_{i})), agg (m))$ .

6.2. Nested SOLAP operations to SPARQL

We now show how a SPARQL query can be generated for a nested SOLAP expression. In general, a nested set of SOLAP operators can be rewritten into an expression with an additional s-dice, on top of a series of s-roll-ups, on top of one or more s-slices, on top of an s-dice, i.e., (s-dice $_{2}$ (s-roll-up $_{1}$ (…s-roll-up $_{k}$ (s-slice $_{1}$ (… s-slice $_{n}$ (s-dice $_{1}$ ( $C$ ))))))).

Let us begin with a simpler nested form that shows the most typical pattern, namely (s-roll-up (s-slice (s-dice( $C$ )))), where initially a subcube graph is selected by s-dice. Afterwards, an s-slice is performed on a higher level of a dimension. Then, an s-roll-up is applied, which aggregates the measures in the sliced cube from a lower level to a higher level. Finally, we could also perform another s-dice for filtering the measures. There may be several s-slices and s-roll-ups in between.

We formulate the nested SOLAP query as $^{3}$ (s-roll-up $^{2}$ (s-slice $^{1}$ (s-dice( $C$ )))) and apply our running examples such that the enumeration of operators can be interpreted as follows: $^{1}$ Get the subcube graph of customers that are located within a 5 km distance from their city center, $^{2}$ slice on the customers of the largest country (which drops the dimension and leaves out all the other countries) and $^{3}$ get the total amount of sales for customers by the city of their closest suppliers (aggregates the measure Sales amount from Customer to Closest City level). Finally, we may also perform another (s-)dice on measures, e.g., filtering the total amount of sales greater than 10500. To perform nested SOLAP operators, we identify a set of principles to be considered by the algorithm.

Principle 1: Perform s-dice in the beginning or at the end.

Principle 2: If there are several s-roll-up or s-slice operations call their generator algorithms repeatedly.

Principle 3: Always separate FILTER clauses when a SOLAP generator algorithm is used. Enumerate separated FILTER clauses. If a SOLAP operator is the final function added to the graph pattern, do not separate the FILTER clause.

Principle 4: Build the final graph pattern with the separated and enumerated FILTER clauses with respect to Principle 3.

Principle 5: Drop the main SELECT clause from each SOLAP generator algorithms and build only one SELECT that is added to the query at the end.

Principle 6: Separate the GROUP BY clause and AGG functions from the s-roll-up generator algorithms (and enumerate them), and build add them to the main (outer) SELECT clause at the end.

Algorithm 5

$WriteSPARQL ((SRU (C) [f^{S} (L (d_{i}))$ , $agg (m)] (SS (C) [l_{b}, l_{s}, v_{a_{s}}] (SD (C) [ϕ^{S}])))) : Q$

To separate the FILTER clauses, we call SOLAP generator algorithms without their FILTER clause and enumerate each FILTER clause for each SOLAP generator algorithm that is used, i.e., S-SliceGenerator $(G_{(C)}^{I}, ϕ^{S}) ∖ {FILTER}^{1}$ (Algorithm 5, Line 4). Then, we build the final graph pattern with these separated FILTER clauses i.e., $GP = GP \cup {FILTER}^{1} \cup {FILTER}^{2}$ (Line 8). When the last SOLAP generator algorithm is called, the output is directly added to the graph pattern without separating its FILTER clause (Line 9). Throughout the algorithm, all the SELECT clauses are omitted and combined into one SELECT in the output on Line 10. According to Principle 6, if there are any GROUP BY clauses and AGG functions (on measures) in inner selects, we eliminate them with “ ∖ ” from the inner selects (Line 9) and finally build the main (outer) select query with (Line 10). Note that in the algorithm, the general graph pattern $GP$ is initially created by the RUPath function (Line 2) and incremented with triple patterns for selected measures (Line 3).

Example 16 (

^{3}

s-roll-up (

^{2}

s-slice (

^{1}

s-dice(

C

)))).

$^{1}$ Get the subcube graph of customer that are located within a 5 km distance from their city center, $^{2}$ slice on the customers of the largest country (which drops the dimension and leave out all the other countries) and $^{3}$ get the total amount of sales of customers by the city of their closest suppliers (aggregates the measure Sales amount from Customer to Closest City level). The query is written starting from the innermost operator s-dice to the outermost operator s-roll-up.

The graph pattern $GP$ is initially created with the RUPath function for the corresponding levels and level attributes (Lines 3 to 18 of the generated SPARQL query in the listing above). The first operator is called by function S-DiceGenerator, where the first FILTER clause of the outer selected is added to the query (Line 37). The second operator is called by the S-SliceGenerator function excluding its FILTER clause (Lines 19 to 27), which is followed by the SRUGenerator function without GROUP BY and AGG statements (Lines 28 to 36). Note that in Line 36, GROUP BY is applied on the lower level Customer, and the actual GROUP BY for the target City level is applied in the last line (Line 40). Separated FILTER clauses for the S-DiceGenerator and S-SliceGenerator functions are later added to the graph pattern (Algorithm 5, Line 8) corresponding to Lines 37 and 38 in the above example. The main outer select query is defined in the first line by specifying the target level (City) and aggregate function on measures (sum of the total Sales).

7. Conclusions and future work

Motivated by the need for a formal foundation for spatial data warehouses on the Semantic Web, this paper made a number of contributions. First, it proposed the QB4SOLAP vocabulary (metamodel), which supports spatially enhanced multidimensional (MD) data cubes over RDF data. This allows users to publish MD spatial data in RDF format. Second, the paper defines a number of spatial OLAP (SOLAP) operators over the defined QB4SOLAP cubes, allowing spatial analytical queries over RDF data, and gives their formal semantics. Third, the paper provides algorithms for generating spatially extended SPARQL queries from individual and nested SOLAP operators, allowing users to write their spatial analytical queries in our high-level SOLAP language instead of the lower-level and more complex SPARQL. Fourth, the vocabulary, operators, and query generation algorithms are validated by applying them to a realistic use case.

Fig. 8.

Our future vision of SOLAP on the SW.

Figure 8 presents our future vision of SOLAP on the Semantic Web with regards to the current state of the art and ongoing work. In order to verify the algorithms, which are defined in this paper, we developed GeoSemOLAP [18], a SPARQL query generation tool. GeoSemOLAP allows users to perform SOLAP operations and generate SPARQL queries interactively through a GUI. We refer to our screencast16

¹⁶

https://youtu.be/Pc3RBPPgBhA

for a detailed demonstration of GeoSemOLAP.

Publishing spatial DWs on the SW allows users to also exploit the existing external geo-vocabularies (e.g., GeoNames, etc.)17

¹⁷

GeoNames: http://www.geonames.org/ Global Administrative Areas: http://gadm.geovocab.org/ NUTS – EU’s Nomenclature of Territorial Units for Statistics: http://nuts.geovocab.org/.

by defining spatial levels and hierarchies from external open data sources. Our main ongoing work thus focuses on developing an RDF2SOLAP enrichment module that performs the multidimensional annotation of existing spatial RDF datasets with QB4SOLAP (semi-)automatically.

Additional interesting aspects of future work would be, for instance, extending the formal techniques and algorithms for generating SOLAP queries in SPARQL to work over multiple RDF cubes, i.e., to support s-drill-across, and supporting spatial aggregation (s-aggregation) with user-defined functions over spatial measures. It would be also interesting to increase efficiency by extending our spatial data warehouse with techniques that been developed in the context of RDF data cubes and SPARQL analytical queries in general, e.g., materialization and optimizing the physical layout [13,21,22], and to enable efficient support of a broad range of external sources by considering aspects such as federated processing of analytical queries [20] and schema heterogeneity [34]. We will also consider more efficient representations of the data, e.g., by removing redundancies. Furthermore, it would be interesting to extend QB4SOLAP and GeoSemOLAP [18] to handle highly dynamic spatio-temporal data and queries, as for instance, found in large-scale transport analytics [12].

Footnotes

Acknowledgement

This research is partially funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence (EM IT4BI-DC) and the Danish Council for Independent Research (DFF) under grant agreement no. DFF-4093-00301.

Appendix

References

Abelló,

Romero,

T.B.

Pedersen,

R.B.

Llavori,

Nebot,

M.J.

Aramburu Cabo and

Simitsis, Using Semantic Web technologies for exploratory OLAP: A survey, IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(2) (2015), 571–588. doi:10.1109/TKDE.2014.2330822.

A.B.

Andersen,

Gür,

Hose,

K.A.

Jakobsen and

T.B.

Pedersen, Publishing Danish agricultural government data as Semantic Web data, in: Semantic Technology – 4th Joint International Conference, JIST 2014, Chiang Mai, Thailand, November 9–11, 2014,

Supnithi,

Yamaguchi,

J.Z.

Pan,

Wuwongse and

Buranarach, eds, Lecture Notes in Computer Science, Vol. 8943, Springer, 2014, pp. 178–186. doi:10.1007/978-3-319-15615-6_13.

Battle and

Kolas, Enabling the geospatial Semantic Web with Parliament and geosparql, Semantic Web 3(4) (2012), 355–370. doi:10.3233/SW-2012-0065.

Bédard,

Bernier,

Larrivée,

Nadeau,

M.J.

Proulx and

Rivest, Spatial OLAP, in: Forum annuel sur la RD, Géomatique VI: Un monde accessible, 1997, pp. 13–14.

Cyganiak and

Reynolds, The RDF Data Cube Vocabulary. W3C Recommendation, 16 January 2014. URL https://www.w3.org/TR/vocab-data-cube/. Additional contributor: J. Tennison.

da Silva,

V.C.

Times,

A.C.

Salgado,

Souza,

do Nascimento Fidalgo and

A.G.

de Oliveira, A set of aggregation functions for spatial measures, in: DOLAP 2008, ACM 11th International Workshop on Data Warehousing and OLAP, Proceedings, Napa Valley, California, USA, October 30, 2008,

I.Y.

Song and

Abelló, eds, ACM, 2008, pp. 25–32. doi:10.1145/1458432.1458438.

C.D.

de Aguiar Ciferri,

R.R.

Ciferri,

L.I.

Gómez,

Schneider,

A.A.

Vaisman and

Zimányi, Cube algebra: A generic user-centric model and query language for OLAP cubes, International Journal of Data Warehousing and Mining 9(2) (2013), 39–65. doi:10.4018/jdwm.2013040103.

R.P.

Deb Nath,

Hose and

T.B.

Pedersen, Towards a programmable semantic extract-transform-load framework for semantic data warehouses, in: DOLAP’15, Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP,

I.-Y.

Song,

Garcia-Alvarado and

Ordonez, eds, ACM, 2015, pp. 15–24. doi:10.1145/2811222.2811229.

É.

Edoh-Alove,

Bimonte and

Pinet, An UML profile and SOLAP datacubes multidimensional schemas transformation process for datacubes risk-aware design, International Journal of Data Warehousing and Mining 11(4) (2015), 64–83. doi:10.4018/ijdwm.2015100104.

10.

M.J.

Egenhofer and

Herring, A mathematical framework for the definition of topological relationships, in: Proceedings of the 4th International Symposium on Spatial Data Handling, Zurich, Switzerland, July 23–27, 1990,

K.E.

Brassel and

Kishimoto, eds, International Geographical Union IGU, Commission on Geographic Information Systems, Dept. of Geography, the Ohio State University, 1990, pp. 803–813.

11.

Etcheverry,

A.A.

Vaisman and

Zimányi, Modeling and querying data warehouses on the Semantic Web using QB4OLAP, in: Data Warehousing and Knowledge Discovery – 16th International Conference, DaWaK 2014, Proceedings, Munich, Germany, September 2–4, 2014,

Bellatreche and

M.K.

Mohania, eds, Lecture Notes in Computer Science, Vol. 8646, Springer, 2014, pp. 45–56. doi:10.1007/978-3-319-10160-6_5.

12.

Gidófalvi,

T.B.

Pedersen,

Risch and

Zeitler, Highly scalable trip grouping for large-scale collective transportation systems, in: EDBT 2008, 11th International Conference on Extending Database Technology, Proceedings, Nantes, France, March 25–29, 2008,

Kemper,

Valduriez,

Mouaddib,

Teubner,

Bouzeghoub,

Markl,

Amsaleg and

Manolescu, eds, ACM International Conference Proceeding Series, Vol. 261, ACM, 2008, pp. 678–689. doi:10.1145/1353343.1353425.

13.

Goasdoué,

Karanasos,

Leblay and

Manolescu, View selection in Semantic Web databases, Proceedings of the VLDB Endowment 5(2) (2011), 97–108, http://www.vldb.org/pvldb/vol5/p097_francoisgoasdoue_vldb2012.pdf. doi:10.14778/2078324.2078326.

14.

L.I.

Gómez,

S.A.

Gómez and

A.A.

Vaisman, A generic data model and query language for spatiotemporal OLAP cube analysis, in: 15th International Conference on Extending Database Technology, EDBT’12, Proceedings, Berlin, Germany, March 27–30, 2012,

E.A.

Rundensteiner,

Markl,

Manolescu,

Amer-Yahia,

Naumann and

Ismail, eds, ACM, 2012, pp. 300–311. doi:10.1145/2247596.2247632.

15.

L.I.

Gómez,

Haesevoets,

Kuijpers and

A.A.

Vaisman, Spatial aggregation: Data model and implementation, Information Systems 34(6) (2009), 551–576. doi:10.1016/j.is.2009.03.002.

16.

Gür,

Hose,

T.B.

Pedersen and

Zimányi, Modeling and querying spatial data warehouses on the Semantic Web, in: Semantic Technology – 5th Joint International Conference, JIST 2015, Revised Selected Papers, Yichang, China, November 11–13, 2015,

Qi,

Kozaki,

J.Z.

Pan and

Yu, eds, Lecture Notes in Computer Science, Vol. 9544, Springer, 2015, pp. 3–22. doi:10.1007/978-3-319-31676-5_1.

17.

Gür,

Hose,

T.B.

Pedersen and

Zimányi, Enabling spatial OLAP over environmental and farming data with QB4SOLAP, in: Semantic Technology – 6th Joint International Conference, JIST 2016, Revised Selected Papers, Singapore, Singapore, November 2–4, 2016,

Y.F.

Li,

Hu,

J.S.

Dong,

Antoniou,

Wang,

Sun and

Liu, eds, Lecture Notes in Computer Science, Vol. 10055, Springer, 2016, pp. 287–304. doi:10.1007/978-3-319-50112-3_22.

18.

Gür,

Nielsen,

Hose and

T.B.

Pedersen, Geospatial OLAP on the Semantic Web made easy, in: Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, April 3–7, 2017,

Barrett,

Cummings,

Agichtein and

Gabrilovich, eds, ACM, 2017, pp. 213–217. doi:10.1145/3041021.3054731.

19.

Han,

Stefanovic and

Koperski, Selective materialization: An efficient method for spatial data cube construction, in: Research and Development in Knowledge Discovery and Data Mining, Second Pacific-Asia Conference, PAKDD-98, Proceedings, Melbourne, Australia, April 15–17, 1998,

Wu,

Ramamohanarao and

K.B.

Korb, eds, Lecture Notes in Computer Science, Vol. 1394, Springer, 1998, pp. 144–158. doi:10.1007/3-540-64383-4_13.

20.

Ibragimov,

Hose,

T.B.

Pedersen and

Zimányi, Processing aggregate queries in a federation of SPARQL endpoints, in: The Semantic Web. Latest Advances and New Domains – 12th European Semantic Web Conference, ESWC 2015, Proceedings, Portoroz, Slovenia, May 31–June 4, 2015,

Gandon,

Sabou,

Sack,

d’Amato,

Cudré-Mauroux and

Zimmermann, eds, Lecture Notes in Computer Science, Vol. 9088, Springer, 2015, pp. 269–285. doi:10.1007/978-3-319-18818-8_17.

21.

Ibragimov,

Hose,

T.B.

Pedersen and

Zimányi, Optimizing aggregate SPARQL queries using materialized RDF views, in: The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Proceedings, Part I, Kobe, Japan, October 17–21, 2016,

P.T.

Groth ,

Simperl,

A.J.G.

Gray,

Sabou,

Krötzsch,

Lécué,

Flöck and

Gil, eds, Lecture Notes in Computer Science, Vol. 9981, 2016, pp. 341–359. doi:10.1007/978-3-319-46523-4_21.

22.

K.A.

Jakobsen,

A.B.

Andersen,

Hose and

T.B.

Pedersen, Optimizing RDF data cubes for efficient processing of analytical queries, in: Proceedings of the 6th International Workshop on Consuming Linked Data Co-Located with 14th International Semantic Web Conference (ISWC 2105), Bethlehem, Pennsylvania, US, October 12th, 2015,

Hartig,

Sequeda and

Hogan, eds, CEUR Workshop Proceedings, Vol. 1426, CEUR-WS.org, 2015, http://ceur-ws.org/Vol-1426/paper-02.pdf.

23.

Kämpgen,

O’Riain and

Harth, Interacting with statistical linked data via OLAP operations, in: The Semantic Web: ESWC 2012 Satellite Events – ESWC 2012 Satellite Events, Revised Selected Papers, Heraklion, Crete, Greece, May 27–31, 2012,

Simperl,

Norton,

Mladenic,

E.D.

Valle,

Fundulaki,

Passant and

Troncy, eds, Lecture Notes in Computer Science, Vol. 7540, Springer, 2012, pp. 27–31. doi:10.1007/978-3-662-46641-4_7.

24.

Koubarakis,

Karpathiotakis,

Kyzirakos,

Nikolaou and

Sioutis, Data models and query languages for linked geospatial data, in: Reasoning Web. Semantic Technologies for Advanced Query Answering – 8th International Summer School 2012, Proceedings, Vienna, Austria, September 3–8, 2012,

Eiter and

Krennwallner, eds, Lecture Notes in Computer Science, Vol. 7487, Springer, 2012, pp. 290–328. doi:10.1007/978-3-642-33158-9_8.

25.

Kyzirakos ,

Karpathiotakis and

Koubarakis, Strabon: A semantic geospatial DBMS, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part I, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7649, Springer, 2012, pp. 11–15. doi:10.1007/978-3-642-35176-1_19.

26.

Lehmann,

Athanasiou,

Both,

García-Rojas,

Giannopoulos,

Hladky,

J.J.

Le Grange,

A.C.

Ngonga Ngomo,

M.A.

Sherif,

Stadler,

Wauer,

Westphal and

Zaslawski, Managing geospatial linked data in the GeoKnow project, in: The Semantic Web in Earth and Space Science. Current Status and Future Directions,

Narock and

Fox, eds, Studies on the Semantic Web, Vol. 20, IOS Press, 2015, pp. 51–78. doi:10.3233/978-1-61499-501-2-51.

27.

Malinowski and

Zimányi, Advanced Data Warehouse Design – from Conventional to Spatial and Temporal Applications. Data-Centric Systems and Applications, Springer, 2008. doi:10.1007/978-3-540-74405-4.

28.

T.B.

Pedersen ,

C.S.

Jensen and

C.E.

Dyreson, A foundation for capturing and querying complex multidimensional data, Information Systems 26(5) (2001), 383–423. doi:10.1016/S0306-4379(01)00023-0.

29.

T.B.

Pedersen and

Tryfona, Pre-aggregation in spatial data warehouses, in: Advances in Spatial and Temporal Databases, 7th International Symposium, SSTD 2001, Proceedings, Redondo Beach, CA, USA, July 12–15, 2001,

C.S.

Jensen,

Schneider,

Seeger and

V.J.

Tsotras, eds, Lecture Notes in Computer Science, Vol. 2121, Springer, 2001, pp. 460–480. doi:10.1007/3-540-47724-1_24.

30.

Perry and

Herring , eds, GeoSPARQL – A Geographic Query Language for RDF Data. Open Geospatial Consortium, 10 September 2012, http://www.opengeospatial.org/standards/geosparql.

31.

D.A.

Randell,

Cui and

A.G.

Cohn, A spatial logic based on regions and connection, in: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92), Cambridge, MA, October 25–29, 1992,

Nebel,

Rich and

W.R.

Swartout, eds, Morgan Kaufmann, 1992, pp. 165–176.

32.

P.Z.

Revesz, Introduction to Databases – from Biological to Spatio-Temporal, Texts in Computer Science, Springer, 2010. ISBN 978-1-84996-094-6. doi:10.1007/978-1-84996-095-3.

33.

Rivest,

Bédard and

Marchand, Toward better support for spatial decision making: Defining the characteristics of spatial on-line analytical processing (SOLAP), GEOMATICA 55(4) (2001), 539–555.

34.

Rouces,

de Melo and

Hose, FrameBase: Representing n-ary relations using semantic frames, in: The Semantic Web. Latest Advances and New Domains – 12th European Semantic Web Conference, ESWC 2015, Proceedings, Portoroz, Slovenia, May 31–June 4, 2015,

Gandon,

Sabou,

Sack,

d’Amato,

Cudré-Mauroux and

Zimmermann, eds, Lecture Notes in Computer Science, Vol. 9088, Springer, 2015, pp. 505–521. doi:10.1007/978-3-319-18818-8_31.

35.

Stadler,

Lehmann,

Höffner and

Auer, Linkedgeodata: A core for a web of spatial open data, Semantic Web 3(4) (2012), 333–354. doi:10.3233/SW-2011-0052.

36.

A.A.

Vaisman and

Zimányi, A multidimensional model representing continuous fields in spatial data warehouses, in: 17th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2009, Proceedings, Seattle, Washington, USA, November 4–6, 2009,

Agrawal,

W.G.

Aref,

C.-T.

Lu,

M.F.

Mokbel,

Scheuermann,

Shahabi and

Wolfson, eds, ACM, 2009, pp. 168–177. doi:10.1145/1653771.1653797.

37.

Varga,

A.A.

Vaisman,

Romero,

Etcheverry,

T.B.

Pedersen and

Thomsen, Dimensional enrichment of statistical linked open data, Journal of Web Semantics 40 (2016), 22–51. doi:10.1016/j.websem.2016.07.003.

38.

I.F.

Vega López,

R.T.

Snodgrass and

Moon, Spatiotemporal aggregate computation: A survey, IEEE Transactions on Knowledge and Data Engineering 17(2) (2005), 271–286. doi:10.1109/TKDE.2005.34.

A foundation for spatial data warehouses on the Semantic Web

Abstract

Keywords

1. Introduction

5 http://www.openstreetmap.org

3.1. Spatial objects

3.2. Spatial operations

Definition 1 (Spatial aggregation).

Definition 2 (Topological relations).

6 RCC8 (Region Connection Calculus) describes regions in Euclidean space or in a topological space by their possible relations to each other.

3.3. Data cubes

3.5. OLAP operators

3.6. Spatial OLAP operators

Table 1 Sample (instance) data for the Sales cube Customer city Supplier Total sales Customer s1 s2 s3 Düsseldorf c1 8 – 3 11 c2 10 – – 10 Dortmund c3 7 4 – 11 c4 – 20 3 23 Münster c5 – – 30 30

4.1. Defining spatial data cube schemas with QB4SOLAP

Definition 5 (Dimensions).

Extension 5 (Spatial dimensions).

Extension 6 (Spatial hierarchies).

Extension 7 (Spatial levels).

Example 4. The triples below show how some of the levels of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 7 and Extension 7. Note that the Customer and City levels are spatial as they have a geometry that is specified at the level definition. Definition 8 (Attributes).

Extension 8 (Spatial attributes).

Extension 9 (Spatial hierarchy steps).

Definition 11 (Measures).

Extension 11 (Spatial measures).

Extension 12 (Spatial fact).

Definition 13 (Level members).

Definition 14 (Attributes of level members).

Definition 15 (Partial order on level members).

Example 9. The triples below show how some level members of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definitions 13–14. Definition 16 (Fact members).

13 http://extbi.cs.aau.dk/SOLAP4SW/queries

Definition 17 (S-Slice).

Definition 18 (S-Dice).

Definition 19 (S-Roll-up).

Definition 20 (S-Drill-down).

6.1. Generation algorithms

7. Conclusions and future work

Footnotes

Acknowledgement

Appendix

References

⁵
http://www.openstreetmap.org

⁶
RCC8 (Region Connection Calculus) describes regions in Euclidean space or in a topological space by their possible relations to each other.

Table 1
Sample (instance) data for the Sales cube

Customer city Supplier Total sales

Customer s1 s2 s3

Düsseldorf c1 8 – 3 11

c2 10 – – 10

Dortmund c3 7 4 – 11

c4 – 20 3 23

Münster c5 – – 30 30

Example 4.
The triples below show how some of the levels of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definition 7 and Extension 7. Note that the Customer and City levels are spatial as they have a geometry that is specified at the level definition.

Definition 8 (Attributes).

Example 9.
The triples below show how some level members of the GeoNorthwind DW (Fig. 7) are represented in RDF using Definitions 13–14.

Definition 16 (Fact members).

¹³
http://extbi.cs.aau.dk/SOLAP4SW/queries