Sage Journals: Discover world-class research

Abstract

Cooperative Intelligent Transport Systems (C-ITS) play an important role for providing the means to collect and exchange spatio-temporal data via V2X-based communication between vehicles and the infrastructure, which will become a central enabler for road safety of (semi)-autonomous vehicles. The Local Dynamic Map (LDM) is a key concept for integrating static and streamed data in a spatial context. The LDM has been semantically enhanced to allow for an elaborate domain model that is captured by a mobility ontology, and for queries over data streams that cater for semantic concepts and spatial relationships. Our approach for semantic enhancement is in the context of ontology-mediated query answering (OQA) and features conjunctive queries over DL-LiteA ontologies that support window operators over streams and spatial relations between spatial objects. In this paper, we show how this approach can be extended to address a wider range of use cases in the three C-ITS scenarios traffic statistics, traffic events detection, and advanced driving assistance systems. We define for the mentioned use cases requirements derived from necessary domain-specific features and report, based on them, on extensions of our query language and ontology model. The extensions include temporal relations, numeric predictions and trajectory predictions as well as optimization strategies such as caching. An experimental evaluation of queries that reflect the requirements has been conducted using the real-world traffic simulation tool PTV Vissim. It provides evidence for the feasibility/efficiency of our approach in the new scenarios.

Keywords

Mobility C-ITS query answering ontology-based data access stream reasoning temporal relations

1. Introduction

The development in (semi)-autonomous vehicles leads to an extensive communication between vehicles and the infrastructure, which is covered by Cooperative Intelligent Transport Systems (C-ITS). These systems produce temporal data (e.g., traffic light signal phases) and geospatial data (e.g., GPS positions), which are exchanged in vehicle-to-vehicle, vehicle-to-infrastructure, and combined communications (V2X). This aids to improve road safety by analyzing traffic scenes that could lead to accidents (e.g., red light violations), and to reduce emissions by optimizing traffic flow (e.g., dissolve traffic jams). A key technology for this is the Local Dynamic Map (LDM) [8] as an integration platform for static, semi-static, and dynamic information in a spatial context.

In previous work, we have semantically enhanced the LDM to allow for an elaborate domain model that is captured by a mobility ontology, and for queries over data streams that cater for semantic concepts and spatial relationships [36]. Our approach is based on ontology-mediated query answering (OQA) and features conjunctive queries (CQs) over ${DL-Lite}_{A}$ [26] ontologies that support window operators over streams and spatial relations between objects. We believe that OQA and the related ontology-based data access (OBDA) [51] are well suited for C-ITS applications, as an ontology can be used to model vehicles, traffic, and infrastructure details, and map to scalable stream database technology adding dynamicity to the model. For example, the definition of a hazardous situation is complex, ranging from bad road conditions to traffic jams [8]. Therefore, an expressive query language is crucial to cover C-ITS specific requirements needed for retrieving dynamic data and expressing complex patterns regarding, event detection for example. Furthermore, scalability and swift response time are crucial since fast changing traffic demands a quick response time of ranging below 1 s to avoid accidents [8].

In this paper, we continue the work in [34,36,37] with the goal of showing how spatial-stream OQA can be used to address a wider set of C-ITS scenarios. For achieving this, the approach in [36] is extended with new domain-specific features beyond “generic” spatial-stream OQA. In cooperation with ITS domain experts from Siemens, the C-ITS scenarios – traffic statistics, events detection, and advanced driving assistance systems (ADAS) – were defined and used to single out requirements derived from a domain-specific list of features. We then formulate, for each use case, requirements that should be covered by our approach. The focus of the new, more specific features will be on temporal relations, e.g., $during$ , as well as numerical and trajectory predictions on a query window. For the qualitative assessment of the use cases, requirements, and domain-specific features, we conducted several interviews with ITS experts, which supported our assumptions that temporal relations and predictions are important extensions, but also gave raise to important extensions for future work such as capturing uncertainty in the ontology and the query language. For the quantitative assessment, we provide a detailed report on the improved implementation. The advances include new temporal relations and predictions, but also improved performance based on the caching of static query elements, and the parallel execution of stream atoms. The implementation is evaluated in an experimental setting using queries that match with the features, where a real-world traffic simulation is used to generate the data. The results provide evidence for the potential feasibility and efficiency of our approach in these scenarios. Our contributions are briefly summarized as follows:

we outline the field of V2X integration using LDMs and provide details on our ontology-based LDM (Section 2);

we define three scenarios, use cases, desired features, and requirements (Section 3);

we conducted expert interviews to obtain feedback on the scenarios, use cases, and query features (Section 4);

we present our current approach including data model, query language, and evaluation strategy (Section 5).

we report on the implementation of our approach in a prototype and comment on the implementation details (Section 6);

we evaluate our work regarding the set of features and requirements based on a traffic simulation and assess the results (Section 7).

we list related work, and evaluate existing stream reasoning systems, wherein we compare the performance of our prototype to the systems C-SPARQL [14] and CQELS [55] (Section 8).

In Section 9, we discuss lessons learned and conclude with ongoing and future work.

This article is a revised and extended version of our preliminary work [34], which we have enriched by the following extensions:

a report on interviews of C-ITS experts regarding the spatial-QA approach for real-world ITS applications, whose suggestions have been incorporated to improve the approach (Section 4);

a further version of the LDM ontology (Section 2) that builds on the Sensor-Observation-Sampling-Actuator (SOSA) vocabulary [48], following an expert suggestion;

a complexity assessment of the query answering approach (Section 5);

an outline of optimization strategies for the query rewriting and evaluation covering related parameters for the strategies (Section 5);

an increased set of experiments taking the optimization strategies into account (Section 6);

a quantitative and qualitative comparison with existing stream reasoning systems (Section 8); and

a discussion of the lessons learned (Section 9).

Furthermore, we have enhanced Sections 5 and 6 with the purpose of adding clarity and self-containment to the paper. For this, we have added definitions and details on the semantics and the initial algorithm for the query answering approach from our initial work [36].

2. C-ITS data integration and query answering

Our setting is the ongoing efforts in data integration and querying in the C-ITS domain. The base technologies for C-ITS are already available and experimentally deployed in infrastructure projects as in [8]. The communication technology is based on the IEEE 802.11p standard, and the data integration effort is the Local Dynamic Map (LDM); there are starting points for this work. IEEE 802.11p allows wireless access in vehicular environments, called V2X communications, which enables messaging between vehicles and the infrastructure. The messages are broadcast every 100 ms by traffic participants, i.e., vehicles and roadside ITS stations, to update other participants about their current states [8]. The main standardized message types are [3–5]:

CAMs (Cooperative Awareness Messages) provide high frequency status updates of a vehicle’s position, speed, and might include vehicle type, model, and turn signals;

MAPs (Map Data Messages) describe the detailed topology of an intersection, including its lanes, their connections, and assigned traffic light (TL) signal groups;

SPaTs (Signal Phase and Timing Messages) contain the projected TL signal phases (e.g., green, yellow, and red) for each lane;

DENMs (Decentralized Environmental Notification Messages) informing whether specific events like road works or a traffic jam occur in a designated area.

CPMs (Collective Perception Messages) are aimed to complement CAMs with information from the surroundings that is collected by the LIDAR and RADAR sensors of a vehicle or by an ITS station and partially shared with other traffic participants. For instance, a temporary obstacle such as a parked car can be detected by a vehicle and forwarded to the other participants;

PAMs (Precise Awareness Messages) are lightweight correction messages, which are used to correct a vehicle’s positions regarding a fixed “virtual anchor”, since in urban areas the GPS position could be imprecise.

2.1. Local dynamic map

The V2X technology does not yet consider the integration of the different types of messages. As a comprehensive integration effort, the EU SAFESPOT project [8] introduced the concept of an LDM, which acts as an integration platform to combine static geographic information system (GIS) maps, with dynamic environmental objects (e.g., vehicles or pedestrians) [1,2]. The integration is motivated by advanced safety applications, which need an “overall” understanding of a traffic environment. The LDM consists of the following four layers (see Fig. 1):

Permanent static: the first layer contains static information obtained from GIS maps and includes roads, intersections, and points-of-interest (POIs);

Fig. 1.

The four layers of a LDM [8].

Fig. 2.

(a) The lightweight and (b) SOSA-based LDM ontology (partial renderings).

Transient static: the second layer extends the static map by detailed local traffic informations such as fixed ITS stations, landmarks, and intersection features like lanes;

Transient dynamic: the third layer contains temporary regional information like weather, road or traffic conditions (e.g., traffic jams), and traffic light signal phases;

Highly dynamic: the fourth layer contains dynamic information of road users taken from V2X messages or in-vehicle sensors like the GPS module.

Recent research by Netten et al. [63], and Shimada et al. [76] suggested that an LDM can be built on top of a spatial relational RDBMS enhanced with streaming capabilities. Netten et al. recognize that an LDM should be represented by a world model, world objects, and data sinks on the streamed input [63]. However, an elaborate domain model captured by an LDM ontology and extended query processing or rule evaluation methods over spatial data streams were still missing in the current approaches. An ontology-based LDM has advantages regarding the maintainability and understandability of the model, since dependencies between the concepts are clearly defined and easy extendable without altering the underlying database (DB).

2.2. Ontology-based LDM

With the support of Siemens and AIST domain experts, we defined two ontologies capturing the main elements of an LDM. In our second modelling, we took the recommendations of the interviewed experts (see Section 4) into account and extended the initial ontology with the “standard” vocabulary of the Sensor-Observation-Sampling-Actuator (SOSA) ontology [48]. This includes concepts such as $Observation$ , $Sensor$ , and $Platform$ , which are combined with the LDM specific vocabulary such as the four levels of the LDM. The resulting ontologies are partially shown in Fig. 2(a) and Fig. 2(b), where asterisks indicate elements that are commented in detail below.2

²
Available at http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapITS-v0.5-Lite.owl and http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapSOSA-v0.1-Lite.owl.

Lightweight LDM ontology Our initial ontology is intended to make querying ITS streams simple, but still capturing the main concepts of a LDM. It follows a layered approach starting with a separation between the top concepts as follows:

$V 2 XFeature$ is the representation of V2X objects, such as the details of an intersection topology including lanes denoted as $V 2 XLane$ and traffic lights denoted as $V 2 XSignalGroup$ ;

$GeoFeature$ represents the GIS aspects of the LDM including POIs, areas like parks, and road networks with $Geometry$ as the geometrical representation of them;

$LDMLayer$ is the representation of the four layers of an LDM, where each feature can be assigned to one layer by the role $isOnLDMLayer$ ;

$Actor$ is the concept that includes persons, vehicles, as well as roadside ITS stations, which are autonomous agents and are the main generator of streamed data;

$Event$ captures prototypical events that happen in the ITS domain. An important subclass of $Event$ is $Hazard$ that captures the different types of dangers, e.g., accidents that might occur;

$CategoricalValues$ specify the different categories such as signal phases, or vehicle roles used in the domain.

We also have defined “domain specific” roles and attributes such as $isPartOf$ , $connected$ , $intersects$ , which deal with the different aspects of intersection and road topologies. In particular, the roles $spatialRelation$ , resp., $temporalRelation$ are of interest, since we have introduced several sub-roles that define possible spatial, resp. temporal, relations between features. Examples are $intersects$ , resp., $during$ , which are needed for querying vehicles that are passing a specific intersection during a specific period.

Extended SOSA ontology In the following, we give an outline of the main concepts of the extension of our initial ontology, aligning it with the SOSA ontology:

$V 2 XFeature$ is the representation of V2X objects and defined as before;

$LDMLayer$ is the representation of the four layers of an LDM and defined as before;

$Observation$ is taken from the SOSA ontology and is stated as “the act of carrying out a procedure to estimate a value of a $Feature$ involving a $Sensor$ and yielding a result” [48], which we call $Value$ ;

$Value$ is the result of an $Observation$ and can either be a $Literal$ , a $Geometry$ , or a Categorical Value from different static categories such as signal phases;

$Sensor$ defines (mostly) physical devices that are the implementation of the observation procedure by acting on changes in the environment, and is the main generator of streams of observations and the assigned values;

$Platform$ is the host or mounting of one or several sensors. A platform can be represented by mobile phones, vehicles, or roadside ITS stations.

$Event$ is a sub-concept and of $Feature$ (as suggested in [48]) and captures prototypical events that occur in the ITS domain.

We have incorporated the “standard” roles and attributes of the SOSA ontology. This includes the central relations between $Platform$ , $Sensor$ , $Observation$ , and $Value$ using the roles $hosts$ , $madeObservation$ , and $hasResult$ for connecting them to each other.

The previously defined “domain specific” roles and attributes such as $isPartOf$ , $connected$ , $intersects$ , as well as $spatialRelation$ and $temporalRelation$ are defined as before. An important help for simpler query formulation are the roles $speed$ , $pos$ , $acceleration$ and $hasState$ . These roles are sub-roles of the lightweight ontology role $ldm : observes$ (shown by the dotted arrow in Fig. 2(b)) and represent role-chains to simplify the more complex relations between the platform, e.g., vehicles, its sensors, and the measured values of observations, e.g., measured speed. Note that the lightweight ontology role $ldm : observes$ and the SOSA role $sosa : observes$ have different meanings, as the latter describes the relation between a sensor and observable properties. The top role $ldm : observes$ can be defined by the following role chains: $\begin{matrix} hosts \circ madeObservation ⊑ hostsObs, and \\ hostsObs \circ hasResult ⊑ observes, \end{matrix}$ where sub-roles specify the particular range that is observed; hence we distinguish between $hasResult$ for composed values and $hasSimpleResult$ for atomic values represented by literals. However, adding role chains to ${DL-Lite}_{A}$ [26] would change the expressiveness of the language. We thus leave role chaining out, but may allow query templates that replace role chains.

Example 2.1.

We illustrate according to Fig. 2(b) a possible query template for $observes$ , which is defined as follows: $\begin{matrix} observes (x, y) : : = & hosts (x, u) \\ \land madeObservation (u, v) \\ \land hasResult (v, y), \end{matrix}$ where $: : =$ defines that the atom $observes$ is replaced in any query by the template atoms on the right side of the argument.

OWL 2 QL Both of the above LDM ontologies are represented in ${DL-Lite}_{A}$ [26], which is the logical underpinning for the W3C standard OWL 2 QL. Apart from the restriction to ${DL-Lite}_{A}$ , our methods are ontology-agnostic; hence the mentioned ontologies, but also other mobility ontologies could be used. In the rest of the paper, we will use the lightweight ontology for the examples, the experiments, and the expert evaluation. The queries and experiments are transferable to the SOSA-based ontology, since the restriction to ${DL-Lite}_{A}$ is retained. As the expert evaluation induced the development of the SOSA-based ontology.

2.3. Spatial-stream query answering

The OQA component is central part regarding the usage of a semantically enhanced LDM, since it allows us to access the streamed data in the LDM.

Example 2.2.
The following query detects red-light violations on intersections by searching for vehicles (in y) with an aggregated trajectory and speed above 30 km/h in a 8 secs window, projecting 3 secs into the future (represented as a negative distance), which move on lanes (in x) during the time intervals, where the signal phases of these lanes will turn to “Stop”, i.e., red, in a 10 secs window with a 5 secs look into the future to capture the current/next changes in the signal phases: $\begin{matrix} q_{1} (x, y) : & LaneIn (x) \land hasLoc (x, u) \\ \land intersects (u, p) \land Vehicle (y) \\ \land pos (y, p @ i_{p}) [traject_line, \\ 5 s, - 3 s] \land (v > 30) \\ \land speed (y, v) [m o v_a v g, 5 s, - 3 s] \\ \land d u r i n g (p @ i_{p}, s @ i_{s}) \\ \land isManagedBy (x, z) \\ \land SignalGroup (z) \land (s =^{'} {Stop}^{'}) \\ \land hasState (z, s @ i_{s}) [last, 5 s, - 5 s] \end{matrix}$

Query $q_{1}$ exhibits the different dimensions that need to be combined:

$Vehicle (y)$ , $isManagedBy (x, z)$ , $SignalGroup (z)$ and $LaneIn (x)$ are ontology atoms, where $isManagedBy$ assigns traffic lights z to incoming lanes x. These atoms have to be unfolded with respect to the concept/role hierarchies of the LDM ontology;

$intersects (u, v)$ and $hasLoc (x, u)$ are spatial atoms, where the first checks spatial intersection and the second returns the object geometries;

$speed (y, v) [traject_line, 5 s, - 3 s]$ and

$pos (y, p @ i_{p}) [mov_avg, 5 s, - 3 s]$ define window operators that aggregate and predict the moving average of speed and positions (represented by a path) of the vehicle y over the streams $speed$ and $pos$ , respectively, and $hasState (z, s @ i_{s}) [last, 5 s, - 5 s]$ returns the traffic lights that have their last phase on “Stop”;

the relation $during (p @ i_{p}, s @ i_{s})$ checks if “p happens during s”, where p is all the occurrences of trajectories on the set of time intervals of $i_{p}$ , and s are the traffic light phases that are on “Stop” in the set of time intervals $i_{s}$ , were $i_{p}$ and $i_{s}$ are derived from the trajectory aggregations and the phase duration of the traffic lights, respectively.

Note that we added for readability the implicit definitions of $p @ i_{p}$ and $s @ i_{s}$ to the original query syntax, where we would use solely p and s. The implicit variables $i_{p}$ and $i_{s}$ represents the time interval annotated to the aggregated values of variable p and s. For instance, the average speed of a vehicle in the interval $[1, 6]$ as $⟨ car, speed, 20 ⟩ @ [1, 6]$ . A user formulating a query can ignore this notation and use p and s.
3. Development of C-ITS scenarios

In this section, we present three application scenarios that are used to define requirements and features split into three complexity levels. On the infrastructure side, we have C-ITS (roadside) stations that receive nearby V2X messages and send messages to inform other participants on their current state, i.e., the traffic light phases. Other participants such as vehicles share their states such as their current speed, acceleration, and position. On the vehicle side, ADAS perceive driving environments and make safe driving decisions to improve safety of autonomous vehicles. The ADAS use sensors such as Lidar/Radar or cameras, and process the sensor data to avoid accidents by detecting pedestrians, vehicles, or other obstacles [81]. The sensor data can be linked to our ontology-based LDM and enables the system to represent the driving environments.

3.1. Scenario description

First, we give an overview of the three scenarios.

S1: Traffic statistics The focus of this scenario is on the collection of statistical data that concerns stops, throughput, traffic distribution, or types of participants by aggregating the streaming data on specific intersections. Regarding this scenario, we have identified the following use cases and related challenges:

Object level: for a single vehicle or station, the average speed, acceleration, number of stops, as well as the temperature could be collected;

Road/Intersection level: on this level, besides calculating a summary of road/lane level indicators such as average throughput, waiting time, the amount of stops, also matrices regarding transfers (e.g. how many cars head straight on), modality, and type mix, (e.g. which vehicle classes are present) could be determined;

Network level: on the network level, intersections are represented by nodes connected by roads. We could collect statistical summaries of indicators on intersections. For instance, estimating the transfer times and traffic flow between intersections.

S2: Hazardous events detection An important C-ITS application is road safety [8], where a reliable event detection is central to find unexpected, hazardous events. This is a more challenging case, since it requires the combination of the topology, vehicle maneuvers, and temporal relations that might be evaluated over longer and shorter periods. We identified the following events as possibly hazardous:

Simple vehicle maneuvers: the following maneuvers are relevant for this case and are directly extractable from trajectories: (1) quick slow down/speed up; (2) drive straight on, turn left, turn right; (3) stop, unload, park;

Complex vehicle maneuvers: the aim is to detect lane changes, overtakes, and u-turns, which are complex maneuvers, composed of simpler maneuvers;

Red-light violation: in Example 2.2, a description of the detection of red-light violations is given;

Vehicle breakdown/accident: this event is based on the stop maneuvers, where we identify vehicles that are not moving and are inside a dangerous area of an intersection. This case can be extended to several vehicles;

Traffic congestions: this is a more complex event, where short and long term observations must be combined. Queuing cars could indicate a congestion and be detected by checking the stop maneuvers of several vehicles that are behind each other, but not stopped by a longer red light phase.

S3: ADAS ADAS features are an important step towards fully automated driving by enabling the vehicle to take control of speed or breaking, where drivers still have the “full” control over the vehicle. The following challenges come for ADAS:

Self monitoring: self-monitoring is a central requirement of ADAS, where intelligent speed adaptation is an important feature to improve roadway safety;

Obstructed view: this concerns dangerous situations where a vehicle might collide with another vehicle, since they have no visual contact due to an obscured view (e.g., buildings). The crossing of the predicted trajectories of two vehicles has to be verified to provide a simple collision detection.

Traffic rules: the embedding of traffic rules like checking of traffic rules such as right-of-way rules could become an important requirement for autonomous driving.

3.2. Features for spatial-stream QA

The eight “standard” requirements for stream processing identified by [78], namely volume, velocity, variety, incompleteness, noise, timely fashion, fine-grained information access, complex domain models, and user intention, as well as the three entailment levels identified by [31] for stream reasoning systems, namely stream-, window-, and graph-level entailment, are not discussed here; they should hold for C-ITS stream systems as well. Besides the generic features F1, F2, F3, and F9, we also focus on domain specific features that are mapped to requirements crucial for enabling the above scenarios. For this, we distinguish for each feature three levels of fulfillment: basic (L1), enhanced (L2), and advanced (L3). We have identified the following feature sets:

F1 – Time model: possible time models are point-based (L1), and interval-based (L2), where L1 is the “simplest” representation. Also belonging to L2, on point-based data, applying aggregations can be represented by intervals based on point-based data items. If we apply an interval-based model, temporal relations (L3) such as Allen’s Time Interval Algebra [6] with operators like before that can be used for querying and inference.

F2 – Process paradigm: queries that are processed in a pull-based (L1) manner should be the baseline. Push-based processing (L2) in particular with sliding windows is already more challenging. If we allow a combined (L3) processing, we could treat high velocity, resp. low velocity, atoms by push-based, resp. as pull-based queries.

F3 – Query features: we consider “basic” query features that are include in a standard query language such as SPARQL or SQL. These features are part of L1 and include selection, projection, join, and filter, as well as time- and triple-based window operators. Combining the basic features by query nesting, unions of CQs, and allowing inter-stream joins belongs to L2.

F4 – Numerical aggregations: aggregations can be “simple” functions such as sum or average on either a set or multiset (bag) of data items (L1). L2 extends L1 by statistical functions such applying the mean, median, or mode, as well as standard deviation, variance, and range. Note that the aggregations are mainly over multisets, since we often have data items of different objects in a single stream.

F5 – Spatial aggregations: a wide range of spatial aggregations can be applied to geometric objects like points and lines (L1) and the aggregation functions need to take the peculiarities of geometries into account, e.g., convex vs. concave objects. L2 adds the computation of spatial relations using a simple point-set model [44] or the more detailed 9-Intersection model (L2) based on the aggregated objects. Smoothing and simplification of complex objects could also be included,which leads to L3.

F6 – Numerical predictions: predictions allow the generation of unknown data items projecting from the past into the future. Several prediction functions such as moving average (L1) or exponential smoothing (L2) regression should be available. Depending on the task, also more complex machine learning methods could be envisioned (L3).

F7 – Trajectory predictions: we predict a vehicle’s movement, by linearly projecting the trajectory into the future (L1). More accurate results could be achieved by (1) a “point-to-curve” aggregation, and (2) calculating possible paths using a road graph (L2), and the usage of machine learning for trajectory predictions (L3).

F8 – Spatial matching: basic spatial matching is the extraction of specific features such as angles from the objects (L1). Advanced features include the matching of complex geometries such as road graphs (L2).

F9 – Advanced reasoning: include the different forms of reasoning ranging from rule-based to ontological reasoning that reaches beyond the expressivity of query answering in OWL 2 QL [52]. The underlying language can include “simple” implications as $a (x, z) \leftarrow b (x, y) \land c (y, z)$ , but also “advanced” features such as recursion as in OWL 2 RL [52], and all combined with strong constraints and negation as failure (NAF) as present in Answer Set Programming (ASP) [33]. Note that in a limited form simple implications are already feasible in OWL 2 RL by allowing role-chaining such as $b \circ c ⊑ a$ .

Table 1
Requirement matrix (level L1/L2/L3 implies required, blank is not required, P is possibly required)

Use Case F1 F2 F3 F4 F5 F6 F7 F8 F9

S1.1 (Object statistics) L1 L1 L1 L1 L1 OWL2 QL

S1.2 (Road/Intersection statistics) L2 L1 L2 L2 L2 L1 L1 OWL2 QL

S1.3 (Network statistics) L2 L1 L2 L2 L2 L2 L1 P OWL2 RL

S2.1 (Simple maneuvers) L1 L1 L1 L1 L2 P P OWL2 QL

S2.2 (Complex maneuvers) L2 L2 L2 L1 L2 L1 L1 L1 OWL2 QL

S2.3 (Red-light violation) L2 L2 L2 L1 L2 L1 L1 L1 OWL2 QL

S2.4 (Vehicle breakdown) L2 L2 L2 L1 L2 L1 P OWL2 QL

S2.5 (Traffic congestion) L2 L3 L2 L2 L3 L2 P OWL2 RL / ASP

S3.1 (Self monitoring) L1 L2 L 1 L1 L1 P P OWL2 QL

S3.2 (Obstructed view) L1 L2 L2 L1 L2 L1 L1 L1 OWL2 QL

S3.3 (Traffic rules) L2 L2 L2 L2 L3 L1 L1 L1 OWL2 RL / ASP

Use Case	F1	F2	F3	F4	F5	F6	F7	F8	F9
S1.1 (Object statistics)	L1	L1	L1	L1	L1				OWL2 QL
S1.2 (Road/Intersection statistics)	L2	L1	L2	L2	L2	L1	L1		OWL2 QL
S1.3 (Network statistics)	L2	L1	L2	L2	L2	L2	L1	P	OWL2 RL
S2.1 (Simple maneuvers)	L1	L1	L1	L1	L2	P	P		OWL2 QL
S2.2 (Complex maneuvers)	L2	L2	L2	L1	L2	L1	L1	L1	OWL2 QL
S2.3 (Red-light violation)	L2	L2	L2	L1	L2	L1	L1	L1	OWL2 QL
S2.4 (Vehicle breakdown)	L2	L2	L2	L1	L2	L1		P	OWL2 QL
S2.5 (Traffic congestion)	L2	L3	L2	L2	L3	L2		P	OWL2 RL / ASP
S3.1 (Self monitoring)	L1	L2	L 1	L1	L1	P	P		OWL2 QL
S3.2 (Obstructed view)	L1	L2	L2	L1	L2	L1	L1	L1	OWL2 QL
S3.3 (Traffic rules)	L2	L2	L2	L2	L3	L1	L1	L1	OWL2 RL / ASP

3.3. Requirements

In Table 1, we show the requirements that are derived by analyzing each scenario and use case with respect to the needed features. The requirements build the base line for the implementation and a later experimental assessment. In case of single features, we only distinguish between L1 to L3 for required, blank for not required, and “P” for possibly required. For instance, in S2.2 for F1, a point-based time model (L1) suffices for detecting left/right turns; however, if we want to detect u-turns, an interval-based time model in combination with temporal relations (L2) will be needed. Furthermore, push-based queries are desired for swift reaction on changes.

The requirements of advanced reasoning (F9) start with QA using OWL2 QL as the baseline. This allows one to access the streams and to enrich them with a domain model. However, in some use cases the expressive power of QA is too limited, since more ‘advanced” features are needed. For instance, if a use case requires to take the road network and reachability into account, OWL2 RL has the expressive power for capturing it. If a use case requires to model traffic rules and traffic regulations, one needs, besides reachability, also constraints and NAF (as provided by ASP). For instance, a warning has to be generated if two traffic lights of crossing lanes are green at the same time.

In our previous work of [36], we already cover level L1 for the features F1 to F5, but aim in this work to introduce new features such as time intervals, temporal relations, and unions of CQs. F6, F7, and F8 are entirely new features.

4. Expert interviews

We conducted four long interviews with ITS experts, who play different roles in the field. The first two experts work in industry and the other two experts come from academia. The interviews were conducted as guideline-based expert interview [19]. Following the guidelines of [19], we left the interviewee the freedom to choose the topic he/she prefers to discuss. However, we had prepared a set of questions, which we asked if the interviewee heads into the direction of a particular topic, e.g., the query language.

Importantly, guideline-based expert interviews are a standard practice of qualitative research in social sciences [60]. Maccoby et al. [56] define an interview as an “interchange in which one person attempts to elicit information or expressions of opinion or belief from another person or persons.” If complex topics are to be investigated, long interviews can be a better choice compared to other (quantitative) methods in filling a knowledge gap; they allow to obtain a more profound insight into the positions of the experts [61]. Other approaches such as questionnaire-based interviews are more appropriate if many interviewees are participating and a quantitative but not qualitative analysis has to be conducted.

Furthermore, we encountered limited availability of experts (for time and business reasons) that have also a good understanding of the methods and techniques surroundings.

The main goal of the interviews is to answer the following general question:

“How suitable is our spatial-stream QA for real-world ITS applications?”

The above question can be split into more specific parts:

Q1: “What are important technologies/developments for future ITS systems?”

Q2: “How well is the LDM suited for the integration of vehicle/traffic control sensor data such as V2X messages?”

Q3: “Is the presented ontology and the approach of ontology-based data access suitable to realise an LDM?”

Q4: “We present three scenarios with several use cases, how relevant are they? Are different use cases needed?”

Q5: “We present different query features, such as trajectory predictions, how important are these features in your opinion? Should they be extended?”

Q6: “We present you an example for our queries for detecting red-light violations; how well is this query comprehensible for you? Do you believe another query language is better suited for this?”

In the remainder of this section, we give a summary of the interviews that were conducted with each expert. We do not transcribe the full interview, which range from 30 minutes to 1.5 hours, but only present a summarized version of each interview. In the Conclusion, we shall come back to recommendations and suggestions from the experts consulted for future work.

Expert 1 The first expert is an initiators of the V2X technology development in Europe. He was responsible, in a large ITS company, for the standardization of V2X messages in different committees of standardization organizations such as ETSI, ISO, and SAE. Additionally, he was participating in several large research projects such as DRIVE C2X.3

³
http://www.drive-c2x.eu/project

First, the expert pointed out that there are not only CAM, SPaT, MAP, and DENM messages, but also messages designed for public transport signal requests called SRM/SSN, as well as position correction messages for autonomous vehicles. The latter are important since in inner-cities, the exact postion in combination with high-resolution maps are crucial for safety. An important task arising form this challenge is the continuous matching of the vehicle position to the high-resolution map. He stated that autonomous vehicles, even with advanced sensors, will in the near future not be able to drive autonomous and safely in inner-cities with complex intersections, in particular when the weather is unpredictable. He then said that messages like the CAM or DENM need to be send near real-time to the surroundings, hence the existing 4G and upcoming 5G mobile standards are not suitable for this purpose, and the standard IEEE 802.11p (also called ITS-G5) is better suited, since it is based on WLAN technology and already available for low latency (in ms) communication. However, existing WLAN technology can not be used, since these protocols require sessions and authentication with a base station, which would cause a long delay for fast moving vehicles. He gave more details on DENM messages, which inform the surrounding on dangerous events, and noted that DENM messages a fired based on triggering conditions that have specific probabilities assigned. The triggering conditions are defined by the standardization bodies, and vehicle manufacturer have to implement them accordingly. Furthermore, he highlighted that more development is needed to protect vulnerable road users, which requires the integration of bluetooth-based communication such as ZigBee, and also the sharing of data with infrastructure sensors such as mounted Lidar/Radar stations.

Second, we discussed the LDM, and he identified the LDM as an integration platform for the V2X messages, where each vehicle has its own proprietary representation of the LDM, since the ETSI standard defines only the interfaces to access it, but not its internal structure. He doubts though that a common definition of the LDM is needed, since every manufacturer should implement it on their own, where only the dynamic elements could be encoded as a snapshot (in a standardized data model) and exchanged with the surrounding. One discussed use case is the exchange of snapshots, where legacy vehicles and/or vulnerable road users are on the road, hence only fixed Lidar/Radar stations could detect them sending their snapshots to other V2X-based vehicles, since the quality of fixed Lidar/Radar stations should be more accurate than of the moving ones. Currently several companies develop Collective Perception Messages (CPMs) [70], which figure as an exchange of the mentioned Lidar/Radar data. A future development could foster the exchange of sensor data by vehicles using the CPMs.

Third, we discussed the usage of ontologies and OQA for realizing the LDM, which he regarded initially sceptical since scalability might be an issue. After explaining the intention of OQA this skepticism was in parts redressed. Regarding the modelling of the ontology, he stated that users do not care how the (ontological) model is shaped, more important is the query language, since users will directly be confronted with it. He identified that the query engine and the stream DB could be migrated and deployed on roadside ITS stations, where the user could cover custom use cases by writing and applying queries.

Fourth, he reviewed the query features, where he explained that an interval-based time model and push-based processing is favourable since more information can be extracted, and in C-ITS applications due to low latency push-based queries should be supported, which could include some buffering techniques so not every change will be computed. He pointed out that “native” query languages such as SQL might be too complicated for writing queries, and a simplified, object-oriented representation (similar to CQ) could be favourable. While discussing the example query, we observed that he understands and reads queries as rules. He noted also that, if the query get complex as in the second scenario, rules might be easier usable to express problems, since they are more capturing “the way people think”. He also stated that the aggregation feature can be also used to guarantee privacy, since on aggregated values single vehicles are not distinguishable anymore. Furthermore aggregations can be used to calculate journey times in a road network. Finally, he agreed that predictions are a crucial feature for accident prevention, hence it should be more elaborated.

Expert 2 The second interviewed expert is the head of a traffic engineering department in a large ITS company and is responsible for managing R&D projects.

The expert described that historically traffic management was a closed, autarkic system, where traffic data was collected in a slow process using radar, road loops, and cameras. She identified V2X as an important step towards collecting real-time data on traffic and vehicles, where besides the mentioned CAM, SPaT, and DENM messages, Collective Perception Messages (CPMs) will play an important role, since they allow one to exchange locally perceived objects by a vehicles sensors, and exchange the object data with other V2X vehicles. She believed that the LDM could be used in combination with CPMs, where a CPM could be aligned with a vehicle’s own LDM. However, she saw no immediate demand for the use of an ontology-based extension of a LDM and the use of spatial-stream queries to access the (streaming) data, since they use their own tools and languages for processing V2X messages. She commented though, that an ontology-enhanced LDM could be used as a data integration platform, which could be used for future data analytics tasks.

Expert 3 The interviewed expert is an associate professor in an European university, where he works in the fields where reasoning and learning over streams is applied to safe autonomous systems including robots, boats, and drones.

First, we discussed the LDM, and he noted that in robotics similar techniques have been used for long, but with the additional challenge that these dynamic maps have to be built on-the-fly and cooperatively. An interesting challenge arises if different agents have a local representation of an LDM, and for coordination a global perspective has to be constructed based upon them. He identified another important task in this context, which is the matching of identical entities (in the sense of a physical grounding of the same entity) detected by different agents/sensors, which they solved by using bridge-rules.

Second, we discussed the LDM ontology, where he identified that meta-data is important, and a possibility of capturing uncertainty should be part of an ontology. Uncertainty introduces the additional challenge that measured observations, e.g., speed, are not crisp anymore, but are inside confidence intervals were the observation holds. He also stated that by using an average or most-likely values instead of intervals, we encounter a loss of information. This uncertainty also can occur in the classification of objects, wherein an instance of a vehicle is detected, but it turns out to be a bike. After discussing our query rewriting technique, he described their approach in robotics, which aims at matching the data to the query (or formula), instead of rewriting queries and checking for instances in the DB. The data is processed/transformed until it matches the queries. If new streamed data appears, they continuously try to match the streamed data to the queries.

Third, he reviewed the query and could understand most under certain assumptions (i.e., that aggregations are grouped), and he believed that including predictions are an important extension, which they consider for their approach as well. Since predictions need an underlying model, he sees that it is important to re-evaluate and monitor the predictions to detect concept drifts. He also commented on the semantics of query language and windows, where it is important to define when to forget data in the stream, and how long a data item holds into the future. Regarding the language features, he noted that intervals are good way to also express temporal uncertainty. Further it would be crucial to respect the underlying time model, which could be dense continuous or discrete. If someone assumes the finite number of changes assumption, it is possible to map the observation from the continuous into the discrete space. Finally, he believed that numerical aggregations and predictions could be enriched by different types of filters, whereby he mentioned that Kalman or particle filters [80] are often used in robotics.

Expert 4 This interviewed expert is a senior researcher in the fields of stream reasoning, graph DBs, and autonomous driving in an European university, and has also been involved in the development of stream reasoning tools.

First, we discussed the LDM, which he saw as a suitable approach to integrate map and streaming data. He pointed out though that classical DB technology as B+ trees [29] are not suited for spatial-stream data, but advances in the field of moving object DBs [45] result in efficient query languages and spatio-temporal indexing methods. Furthermore, the LDM does not consider a 3D representation of maps, which is crucial if the map is used in the context of autonomous driving, since detected objects of a image recognition step (e.g., building), need to be anchored in a 3D map.

Second, he evaluated the LDM ontology, where he agreed on the modelling regarding map features, but criticized that the LDM ontology misses to concept of sensors, resp., observations, which is crucial in C-ITS applications. He noted that elements such as $Stimulus$ , $Sensor$ , and $Observation$ pattern of the Semantic Sensor Network (SSN) ontology [46] could be incorporated in the LDM ontology.

Third, he reviewed the query language, and easily understood the presented query (for red-light violations). He noted that the definition of the ontology and spatial atoms are clear, but the stream atoms need a formal grammar to capture an atoms parameterization such as $speed [avg, 10 s]$ . Subsequently, the definition of $speed [last, 5, - 5 s]$ , i.e., the last element in a 10 secs window projecting 5 secs to the future, was not clear to him. He further suggested that using a SPARQL-dialect, either C-SPARQL [14] or CQELS [55] could be suitable languages. Evaluating the language features, he believed that all presented features are important, but more focus in terms of research and development should be given numerical- (F6) and trajectory prediction (F7), as he saw them as crucial for autonomous driving, since these are the main techniques used, for instance in object recognition and motion planning. He also believed that the combination of (stream) reasoning and machine learning is in its early stages and needs more attention from the community. He further suggested that the aggregates (F4) could be extended with top-k aggregate functions [47].

Fourth, he criticized that the scenario S3 is too generic, and should be narrowed to more specific tasks in the domain of autonomous driving; he suggested to include motion planing as a replacement for the generic scenario.

5. Approach for spatial-stream query answering

First, we introduce the data model and the definition of a spatial-stream knowledge base, which leads to our main focus of spatial-stream queries. Then, we introduce the “standard” query rewriting and extend it with temporal relations. Finally, we describe how the queries are evaluated putting the focus on stream aggregation and predictions. We start from previous work in [36], which introduced spatial ontology-mediated query answering over Mobility Streams using ${DL-Lite}_{A}$ [26]. We focus on pull-based queries that are evaluated at one single time point called the query time $T_{i}$ .

5.1. Data model and knowledge base

Our data model is point-based and captures the valid time, extracted from the V2X messages, saying that some data item is valid at that time point. Importantly, while evaluating a query, the model can change (temporary) to an interval-based model that results from the window and aggregation functions. To capture streaming data, we introduce the timeline $T$ , which is a closed interval of $(N, ⩽)$ . A data stream is a triple $D = (T, v, P)$ , where $T$ is a timeline, $v : T \to ⟨ F, S_{F} ⟩$ is a function that assigns to each element of $T$ (called a timestamp) data items of $⟨ F, S_{F} ⟩$ , where $F$ (resp. $S_{F}$ ) is a stream (resp. spatial-stream) DB; the integer P is called pulse defining the general interval of consecutive data items on the timeline (cf. [22]), which naturally induces a stream of data items. We always have a main pulse with a fixed interval length that defines the highest granularity of the validity of data points. Larger pulses for streams with lower frequency can be defined, which can be exploited to perform optimizations such as caching. The pulse also aligns the data items that arrive asynchronously in the DB to the timeline. A spatial or spatial-stream DB is a Data Stream Management System that support spatial and geospatial objects and operators (e.g., Odysseus, PipelineDB or SQLstream).4

⁴
http://odysseus.informatik.uni-oldenburg.de/, https://www.pipelinedb.com/, and https://sqlstream.com/.

Example 5.1.

For the timeline $T = [0, 100]$ , we have the stream $F_{CAM} = (T, v, 1)$ of vehicle positions and speed at the assigned time points for the individuals $c_{1}$ , $c_{2}$ and $b_{1}$ : $\begin{matrix} v (0) = & {speed (c_{1}, 30), pos (c_{1}, (5, 5)), \\ speed (c_{2}, 10), pos (c_{2}, (4, 4)), \\ speed (b_{1}, 10), \dots}; \\ v (1) = & {speed (c_{1}, 29), pos (c_{1}, (6, 5)), \\ speed (c_{2}, 0), pos (c_{2}, (5, 4)), \\ speed (b_{1}, 5), \dots}; \dots \end{matrix}$

A “slower” stream $F_{SPaT} = (T, v, 5)$ captures the next signal state of a traffic light: $v (0) = {hasState (t_{1}, Stop)}$ and $v (5) = {hasState (t_{1}, Go)}$ . The static ABox contains the assertions $Car (c_{1})$ , $Car (c_{2})$ , $Bike (b_{1})$ , and $SignalGroup (t_{1})$ . A different “annotated” representation by applying the function v on $F_{CAM}$ yields ${speed (c_{1}, 30) @ t 0, \dots, speed (c_{1}, 29) @ t 1}$ , which is better suited for an interval-based time model.

We consider a vocabulary of individual names $Γ_{I}$ , domain values $Γ_{V}$ (e.g., $N$ ), and spatial objects $Γ_{S}$ . Given atomic concepts A, atomic roles P, and atomic attributes $U_{C}$ , we define (a) basic concepts B, basic roles Q, and basic value-domains E (attribute ranges); (b) complex concepts C, complex role expressions R, and complex attributes $V_{C}$ ; and (c) value-domain expressions D: $\begin{matrix} (1) & \begin{matrix} Q : : = P ∣ P^{-} \\ B : : = A ∣ \exists Q ∣ δ (U_{C}) \\ E : : = ρ (U_{C}) \\ C : : = ⊤_{C} ∣ B ∣ \neg B ∣ \exists Q . C^{'} \\ D : : = ⊤_{D} ∣ D_{1} ∣ \dots ∣ D_{n} \\ R : : = Q ∣ \neg Q \\ V_{C} : : = U_{C} ∣ \neg U_{C} \end{matrix} \end{matrix}$ where $P^{-}$ is the inverse of P, $⊤_{D}$ is the universal value-domain and $⊤_{C}$ is the universal concept; furthermore, $U_{C}$ is a given attribute with domain $δ (U_{C})$ (resp. range $ρ (U_{C})$ ). A ${DL-Lite}_{A}$ knowledge base (KB) is a pair $K = (T, A)$ where the TBox $T$ and the ABox $A$ consist of finite sets of axioms as follows:

inclusion assertions of the form $B ⊑ C$ , $Q ⊑ R$ , $E ⊑ D$ , and $U_{C} ⊑ V_{C}$ ; respectively

functionality assertions of the form $funct Q$ and $funct U_{C}$ ;

membership assertions of the form $A (a)$ , $D (c)$ , $P (a, b)$ , and $U_{C} (a, c)$ , where a, b are individual names in $Γ_{I}$ and c is a value in $Γ_{V}$ .

We extend ${DL-Lite}_{A}$ and introduce the possibility to specify the localization of atomic concepts and roles. For this, we extend the standard ${DL-Lite}_{A}$ syntax as follows: $\begin{matrix} (2) & \begin{matrix} C : : = ⊤_{C} ∣ B ∣ \neg B ∣ \exists Q . C^{'} ∣ (loc A) ∣ \\ ({loc}_{s} A) \\ R : : = Q ∣ \neg Q ∣ (loc Q) ∣ ({loc}_{s} Q), \end{matrix} \end{matrix}$ where $s \in Γ_{S}$ and the concept and roles are as defined before. Intuitively, $(loc A)$ is the set of individuals in A that can have a spatial extension (e.g., $(loc Parks)$ ), and $({loc}_{s} A)$ is the subset where it is s (e.g., $({loc}_{(48.20, 16.37)} Vienna)$ ).

The extension with streaming consists of the following axiom schemes $\begin{matrix} (3) & ({stream}_{F} C) and ({stream}_{F} R), \end{matrix}$ where F is a particular stream over either complex concepts C or roles R of $T$ . Intuitively, $({stream}_{F} C)$ and $({stream}_{F} R)$ are axioms defining that concept C and role R are based on streamed items from the data stream F. In a static setting, F is always empty and hence $C / R$ can be treated like an ordinary concept/role. In a hypothetical case where all data items in F are known, the concept/role is interpreted over all time points of $T$ . The dynamic case with changing data items is discussed in Section 5.3.

Example 5.2.

A TBox may contain $({stream}_{CAM} speed)$ , $({stream}_{CAM} (loc pos))$ , $({stream}_{CAM} Vehicle)$ , and $({stream}_{SPaT} hasState)$ ; we have the axioms $Car ⊑ Vehicle$ , $Bike ⊑ Vehicle$ , $Ambulance ⊑ Vehicle$ , and $Ambulance ⊑ \exists hasRole . Emergency$ describing vehicle classes.

Finally, a spatial-stream knowledge base is a tuple $\begin{matrix} K = ⟨ T, A, S_{A}, ⟨ F, S_{F} ⟩, B ⟩, \end{matrix}$ where $T$ ( $A$ , resp.) is a ${DL-Lite}_{A}$ TBox (ABox, resp.), $S_{A}$ is a spatial DB, and $⟨ F, S_{F} ⟩$ is a spatial-stream DB. Furthermore, $B \subseteq Γ_{I} \times Γ_{S}$ is a partial function called the spatial binding from $A$ to $S_{A}$ and $⟨ F, S_{F} ⟩$ . The binding function $B$ does not relate to the mapping function known from OBDA mapping of concepts and roles to an underlying DB (as used in [74]), but guarantees that spatial objects in $A$ have a spatial extension in $S_{A}$ and $⟨ F, S_{F} ⟩$ .

Semantics Without a spatial-stream extension, we keep the given semantics of ${DL-Lite}_{A}$ defined as usual (see [26] for details), based on interpretations $I = ⟨ Δ^{I}, \cdot^{I} ⟩$ of a ${DL-Lite}_{A}$ KB $⟨ T, A ⟩$ , where $Δ^{I}$ is a non-empty domain and $\cdot^{I}$ is an interpretation function of the vocabulary.

First, we give a semantics to $(loc Q)$ and $({loc}_{s} Q)$ for individuals of Q with some spatial extension respectively located at s, such that a KB $K_{S} = ⟨ T, A, S, B ⟩$ can be transformed into an ordinary ${DL-Lite}_{A}$ KB $K_{O} = ⟨ T^{'}, A^{'} ⟩$ , using the fresh spatial top concept $C_{S_{T}}$ and spatial concepts $C_{s}$ . An interpretation of $K_{S}$ is a structure $I_{S} = ⟨ Δ^{I}, \cdot^{I}, b^{I} ⟩$ , where $⟨ Δ^{I}, \cdot^{I} ⟩$ is an interpretation of $⟨ T, A ⟩$ , and $b^{I} \subseteq Δ^{I} \times Γ_{S}$ is a partial function that assigns some individuals a location, such that for every $a \in Γ_{I}$ , $(a, s) \in B$ implies $b^{I} (a^{I}) = s$ . We extend the semantics with $(loc Q)$ and $({loc}_{s} Q)$ , where Q is an atomic role in $T$ : $\begin{array}{rcl} (4) & \begin{matrix} {(loc Q)}^{I_{S}} \supseteq ⋃_{s \in Γ_{S}} {({loc}_{s} Q)}^{I_{S}}, where \\ {({loc}_{s} Q)}^{I_{S}} = {(a_{1}, a_{2}) \in Q^{I} ∣ (a_{2}, s) \in b^{I}} . \end{matrix} \end{array}$ The transformation of $K_{S}$ to an ordinary ${DL-Lite}_{A}$ KB $K_{O}$ is described in [35] and guarantees that we can transform $K_{S}$ into an ordinary ${DL-Lite}_{A}$ KB for query rewriting.

Second, we give an initial streaming semantics by interpreting the streamed KB over the full timeline. The timeline is captured by a finite sequence $F_{A} = {(F_{i})}_{T_{min} ⩽ i ⩽ T_{max}}$ of temporal ABoxes, which is obtained via the evaluation function v on $F$ and $T$ (similar to [21]). Then, we define the interpretation of the point-based model over $T$ as a sequence $I_{F} = {(I_{i})}_{T_{min} ⩽ i ⩽ T_{max}}$ of interpretations $I_{i} = ⟨ Δ^{I}, \cdot^{I_{i}} ⟩$ ; then $I_{F}$ is a model of $K_{F}$ , denoted $I_{F} ⊧ K_{F}$ , iff $I_{i} ⊧ F_{i}$ and $I_{i} ⊧ T$ , for all $i \in T$ .

The semantics of the $({stream}_{F} C)$ and $({stream}_{F} R)$ axioms is along the same lines. A stream axiom is satisfied, if a complex concept C (resp. role R) holds over all the time points of stream $F = (T, v, P)$ ; thus we restrict our models such that: $\begin{matrix} (5) & \begin{matrix} {({stream}_{F} C)}^{I} = ⋂_{i \in tp (T, P)} C^{I_{i}} and \\ {({stream}_{F} R)}^{I} = ⋂_{i \in tp (T, P)} R^{I_{i}}, \end{matrix} \end{matrix}$ where $tp (T, P)$ is a set of time points determined by the segmentation of $T$ by P. This allows us to check satisfiability of a KB and gives us a notion of global consistency over the full timeline, which is expected for fixed log data of theoretical nature.

5.2. Query language

Our query language is based on conjunctive queries (CQs) and adds spatial-stream capabilities (as shown in Example 2.2). A spatial-stream CQ $q (x)$ is a formula: $\begin{matrix} (6) & \begin{matrix} ⋀_{i = 1}^{m} Q_{O_{i}} (x, y) \land ⋀_{j = 1}^{n} Q_{S_{j}} (x, y) \\ \land ⋀_{k = 1}^{o} Q_{D_{k}} (x, y) \land ⋀_{l = 1}^{p} Q_{T_{l}} (x, y) \end{matrix} \end{matrix}$ where x are the distinguished (answer) variables, y consists of non-distinguished (existentially quantified) variables, objects, and constant values:

each atom $Q_{O_{i}} (x, y)$ has the form $A (z)$ or $P (z, z^{'})$ , where A is a class name, P is a property name of the LDM ontology, and z, $z^{'}$ are from x or y;

each atom $Q_{S_{j}} (x, y)$ is from the vocabulary of spatial relations and of the form $S (z, z^{'})$ , where z, $z^{'}$ represents geometries matched by S, where S is one of the following relations: $S = {intersects, contains, next, equals, within, disjoint, outside}$ ;

each atom $Q_{D_{k}} (x, y)$ is similar to $Q_{O_{i}} (x, y)$ but adds stream operators that relate to Continuous Query Language operators [11]. We have a window $[agr, b, e]$ over a stream $D_{k}$ , where b and e are the bounds of the window in time units (positive for past, negative for future) and an aggregate function $agr$ applied to the data items in the window:

$[agr, b]$ represents the aggregate of last (if positive) or next (if negative) b time units of stream $D_{k}$ ;

$[b]$ represents the single tuple of stream $D_{k}$ at index b with $b = 0$ if it is the current tuple;

$[agr, b, e]$ : represents the aggregate of a window $[b, e]$ in the past/future of $D_{k}$ .

each atom $Q_{T_{l}} (x, y) = (T_{1} (z_{1}, z_{1}^{'}), \dots, T_{q} (z_{q}, z_{q}^{'}))$ represents a disjunction of temporal relations, where the variables $z_{1}$ to $z_{q}^{'}$ represent matches, i.e., individuals annotated with time points or intervals, which are filtered by the temporal relation $T_{i}$ . For time points, $T_{i} = T_{i}^{P}$ is a binary arithmetic relation, namely ${<, ⩽, =, ⩾, >}$ . For intervals, we choose the 13 relations of Allen’s Time Interval Algebra [6], where $T_{i} = T_{i}^{I}$ is taken from ${before, equal, meets, overlaps, during, starts, finishes}$ and the set of their inverses (e.g., $durin g^{-}$ ), which filter variable matches according to the start/end points of the intervals.

The “historic” window operator

[agr, b, e]

is derived from Brandt et al. [22] and allows us to query logs represented by data streams. Details on handling the temporal relations and aggregate functions are given below. We also have added a limited form of disjunction in our temporal relations.

5.3. Query rewriting by stream aggregation

For the evaluation of spatial-stream CQs, we have to extend OQA to handle spatial and streaming data, which is not considered in the standard approach as [26]. In detail, we aim at answering pull-based queries at a single time point $T_{i}$ with stream atoms that define aggregate functions on different windows sizes relative to $T_{i}$ . For this, we consider a semantics based on epistemic aggregate queries (EAQ) [27] over ontologies by dropping the order of time points inside a window and handle the streamed data items as bags (multi-sets).

In the rest of this subsection, we give a detailed outline of evaluating EAQs following [36].

Epistemic aggregate queries (EAQs) Introduced by [27], EAQs are defined over bags of numeric and symbolic values, called groups, denoted as ${| \cdot |}$ . Aggregates cannot be directly transferred to ${DL-Lite}_{A}$ , due to the certain answer semantics, where each model has different groups, which result from the grouping of unknown individuals; this leads to (undesired) empty answers. In [27], the authors extended the semantics for DL-Lite with aggregates by an epistemic operator K and a two-layer evaluation including the completion of the query atoms w.r.t. $T$ . The basic idea is to close the atoms, so only known individuals are grouped and aggregated. More formally, a simplified EAQ is defined as5

⁵
We simplified EAQs of [27] by omitting ψ and consider only aggregates with a single variable.

\begin{matrix} (7) & q_{a} (x, agr (y)) : K x, y, z . ϕ, \end{matrix}

where x are the grouping variables,

agr (y)

is the aggregate function and variable, and ϕ is a CQ called main conditions; z are the disjoint existential variables of ϕ. We call

w : = x \cup y \cup z

the K-variables of ϕ. Values for a tuple d are grouped in multisets, denoted as

H_{d}

, which are defined as:

\begin{matrix} (8) & \begin{matrix} H_{d} : = & {| π (y) ∣ π \in K {Sat}_{I, K} (w; ϕ) \\ such that π (x) = d |}, \end{matrix} \end{matrix}

where

K {Sat}_{I, K} (w; ϕ)

is the set of satisfying K-matches of ϕ for the model

I

(of

K

A K-match for a query $q_{a}$ in an interpretation $I = ⟨ Δ^{I}, \cdot^{I} ⟩$ is defined by a function π that maps w to the domain $Δ^{I}$ . $K Sat$ is defined as follows: $\begin{array}{rcl} (9) & \begin{matrix} K {Sat}_{I, K} (w; ϕ) : = & {π \in Eval (ϕ, I) ∣ \\ π (w) \in Cert ({aux}_{q_{a}}, K)}, \end{matrix} \end{array}$ where ${aux}_{q_{a}} (w) \leftarrow ϕ$ is an auxiliary atom used to map w only to known solutions. $Cert ({aux}_{q_{a}}, K)$ are the certain answers of ${aux}_{q_{a}}$ over $K$ .

Then, the set of epistemic K-answers for an EAQ query over a model $I$ of $K$ can be defined as: $\begin{matrix} (10) & \begin{matrix} q_{a}^{I} : = & {(d, agr (H_{d})) ∣ d = π (x), \\ for some π \in K {Sat}_{I, K} (w; ϕ)} \end{matrix} \end{matrix}$ where $H_{d}$ and $K Sat$ are defined as above. The epistemic certain answers $ECert (q_{a}, K)$ for a query $q_{a}$ over $K$ is the set of K-answers that are answers in every model $I$ of $K$ . Calvanese et al. [27] showed that $ECert (q_{a}, K)$ can be computed by a “general algorithm” $GA$ , which (1) computes the certain answers, (2) projects them on the K-variables, and (3) aggregates the resulting tuples. Importantly, evaluating EAQs reduces to standard CQ evaluation over ${DL-Lite}_{A}$ with $LOGSPACE$ data complexity.

Lifting EAQs to streams Our approach is to evaluate the EAQs over filtered and merged temporal ABoxes, which are created according to the window operator and $T_{i}$ . We introduce a function that filters and merges (streamed) data items, relative to the window size and $T_{i}$ , and creates several windowed ABoxes $A_{⊞_{ϕ}}$ , which are the unions of the static ABox $A$ and the filtered stream ABoxes from $F$ . The EAQ aggregates are applied on each windowed ABox $A_{⊞_{ϕ}}$ by aggregating normal objects, concrete values, and spatial objects. More formally, a stream atom $ϕ ⊞_{T}^{L} agr$ is evaluated as EAQ $\begin{matrix} (11) & q_{ϕ} (x, agr (y)) : K x, y, z . ϕ ⊞_{T}^{L} \end{matrix}$ over ontologies, where x are the grouping variables and y is the aggregate variable, z are the disjoint existential variables, and ϕ is a query atom in the same scope of the window operator $⊞_{T}^{L}$ and aggregate functions $agr$ .

In a stream setting, $K {Sat}_{I_{⊞}, K_{⊞}} (w; ϕ)$ is now the set of K-matches of ϕ for a model $I_{⊞}$ of $K_{⊞}$ , where the windowed ABox $A_{⊞}$ is defined as $A_{⊞} = A \cup ⋃ {A_{i} ∣ w_{b} ⩽ i ⩽ w_{e}}$ , where $w_{b}$ , resp., $w_{e}$ is the absolute time point for the begin, resp., end of a window. We have four cases for the calculation of $w_{b}$ and $w_{e}$ based on the window size L and a pulse P, where P enlarges L according to its interval length:

a current window with $L = 0$ , i.e. $w_{b} = w_{e} = T_{i}$ ;

a past window with $L > 0$ leading to $w_{b} = (T_{i} - L)$ and $w_{e} = T_{i}$ ;

a future window with $L < 0$ that is $w_{b} = T_{i}$ and $w_{e} = (T_{i} + | L |)$ ; and

the entire history of a stream resulting in $w_{b} = 0$ and $w_{e} = T_{i}$ .

We obtain KB

K_{⊞} = ⟨ T, A_{⊞} ⟩

as above; the epistemic (certain) answers for

q_{ϕ}

over

K_{⊞}

are naturally defined as

\begin{array}{l} {ECert}_{⊞} (q_{ϕ}, K_{⊞}) = ⋂_{I_{⊞} ⊧ K_{⊞}} q_{ϕ}^{I_{⊞}}, where \\ (12) & \begin{matrix} q_{ϕ}^{I_{⊞}} = & {(d, agr (H_{d})) ∣ d = π (x), \\ for some π \in K {Sat}_{I_{⊞}, K_{⊞}} (w; ϕ)} \end{matrix} \end{array}

are the K-matches that are answers in the model

I_{⊞}

K_{⊞}

. In

{ECert}_{⊞}

, we did not yet address the validity of an assertion, say in

A_{⊞_{1}}

, until the next assertion in

A_{⊞_{3}}

. Two semantics are suggestive: an initial semantics ignores intermediate time points, and thus

A_{⊞_{2}}

will be unknown. A second, inertia-based semantics fills the missing gaps with the previous assertions is left for future work.

${ECert}_{⊞}$ gives the certain answers for a single EAQ including the ontology atoms in the same scope as the stream atoms. A full spatial-stream CQ q can be answered by answering each EAQ $q_{ϕ_{k}}$ separately and joining the answers.

Naive algorithm Finally, we introduce the algorithm $NSQ$ (see Algorithm 1), where $z^{ϕ}$ are the non-distinguished variables in ϕ and $PerfectRef$ (resp. $Answer$ ) is the “standard” query rewriting (resp. evaluation) as in [26]; $NSQ$ extends the $GA$ algorithm of [27] to compute the answers for stream CQs by:

calculate the epistemic answer for each stream atom over the different windowed ABoxes and store the result in an auxiliary ABox using fresh atoms. Furthermore, replace each stream atom with a new auxiliary atom;

calculate the certain answers over $A$ and the auxiliary ABox, using standard ${DL-Lite}_{A}$ query evaluation.

We omit the details of different aggregate functions used in ${ECert}_{⊞}$ and refer to [27] and [36] for an in-depth discussion.

Algorithm 1:

NSQ – Naive stream query answering

5.4. Query rewriting with temporal relations

At first sight, spatial and temporal relations could be treated similarly. As shown in [36], we are able to evaluate spatial relations regarding their Point-Set Topological Relations. This amounts to pure set theoretic operations on point sets using the function $points (p)$ , which defines the (infinite) set of points of a geometry p that is a sequence $p = (p_{1}, \dots, p_{n})$ of (defined) points. For instance, the relation $inside (x, y)$ between geometries is defined as ${(x, y) : points (y) \subseteq points (x)}$ . However, for temporal relations, we distinguished point-based relations that can be encoded as simple arithmetic filters, from interval-based relations, where the 13 relations of Allen’s Time Interval Algebra (IA) [6] can hold between two intervals. The domain of IA relations is the set of intervals $I = {[p_{1}], \dots, [p_{k}]}$ over the linear order of $T$ defined as $[p_{i}] = [\underline{p_{i}}, \overline{p_{i}}]$ with $\underline{p_{i}} < \overline{p_{i}}$ . The binary basic IA relations are defined according to their start/end points as follows [6]: $\begin{array}{l} before (x, y) = {(x, y) : \underline{x} < \overline{x} < \underline{y} < \overline{y}} \\ meets (x, y) = {(x, y) : \underline{x} < \overline{x} = \underline{y} < \overline{y}} \\ overlaps (x, y) = {(x, y) : \underline{x} < \underline{y} < \overline{x} < \overline{y}} \\ (13) & starts (x, y) = {(x, y) : \underline{x} = \underline{y} < \overline{x} < \overline{y}} \\ finishes (x, y) = {(x, y) : \underline{y} < \underline{x} < \overline{x} = \overline{y}} \\ during (x, y) = {(x, y) : \underline{y} < \underline{x} < \overline{x} < \overline{y}} \\ equal (x, y) = {(x, y) : \underline{y} = \underline{x} < \overline{x} = \overline{y}} \end{array}$

Interpretation of IA relations IA relations can be interpreted over the sets of intervals $I_{A}$ and $I_{B}$ in two ways: (a) IA filtering, where each relation is treated as a single binary constraint. In that sense, the temporal relation acts as a filter on all intervals in $I_{A} \times I_{B}$ that match the relations regarding their start/end points; (b) IA reasoning, which requires the computation of the path consistency of all temporal relations over the intervals in $I_{A} \cup I_{B}$ using the predefined composition table of [6]. The composition table is defined as a set of transitive rules on basic relations, which are applied until no new general relations can be inferred. For instance, if we have the edges $during (I_{1}, I_{2})$ and $during (I_{2}, I_{3})$ , we can infer a new relation $during (I_{1}, I_{3})$ . Note that only with approach (b) all possible (chained) relations between intervals are derivable. A well-known representation for IA relations are IA graphs (also called IA networks), which are directed graphs, where the vertices are the intervals of $I_{A}$ and $I_{B}$ and the edges represent the IA relations that hold between two intervals. Hence, an IA graph (closed by transitive rules) is a materialization of all relations that can hold between intervals, and can be used to check the relations if a directed edge exists.

From timestamps to intervals Time intervals are not directly represented in our streamed data, but are an intermediate product of the EAQ evaluation and are used to annotate the resulting objects. After the evaluation of an EAQ, the results (also called answers) are aggregated data items, hence each single timestamp is not “meaningful” anymore; we need to assign time intervals to the answers, which are taken either from the window itself, or are extracted from the aggregated data items.

We already have introduced the function v that assigns to each element of $T$ the data items of $⟨ F, S_{F} ⟩$ . Now, we define the function $v^{*} : E_{ϕ} \to I$ , where $E_{ϕ}$ is the set of epistemic (certain) answers that result from evaluating the EAQ $q_{ϕ}$ (as described before) and $I$ is the set of intervals ${[p_{1}], \dots, [p_{k}]}$ over the linear order of $T$ ; $q_{ϕ}$ represents a single query with the atom $ϕ [agr, b, e]$ , where $agr$ , b, and e are defined as before. The function $v^{*}$ assigns to each answer of $q_{ϕ}$ in $E_{ϕ}$ one interval of $I$ . For instance, we would assign the window size of $[2, 6]$ to an answer as follows: $v^{*} (speed (c_{1}, 50)) = [2, 6]$ . Note that we also use a shorter notation using @, which would be in our example: $speed (c_{1}, 50) @ [2, 6]$ .

The function $v^{*}$ can be defined in different ways. In a first approach, we use $T_{i}$ and the window sizes b and e for generating the interval for the annotations. For instance, having $T_{5}$ and $speed [avg, 3, - 1]$ , we would annotate to each grouped/aggregated match the interval $[2, 6]$ . In a second approach, we extract for each grouped/aggregated answer of an EAQ the upper and lower bounds of the timestamps of each data item that is in the group/aggregation.

More sophisticated approaches could include a segmentation of the data items, thus creating different fragmented sub-intervals.

5.5. Query evaluation by hypertree decomposition

The four types of query atoms need different evaluation techniques over separate DB entities. Ontology atoms are evaluated over the static ABox A using a “standard” ${DL-Lite}_{A}$ query rewriting, i.e., PerfectRef [26]. For spatial atoms, we need to dereference the bindings to the spatial ABox $S_{A}$ and evaluate the spatial relations to filter spatial objects. Stream atoms are computed as EAQ to group and summarized over the temporary ABoxes based on the streams as described before.

For temporal atoms, we consider three techniques, where the first is only suitable for time points and the others are suitable for time intervals:

Adding the filter conditions to the rewritten query for delimiting possible time points;

Rewriting each IA relation of $Q_{T_{l}}$ into a filter that encodes the equation with the start/end points (as defined before);

Full IA-based reasoning.

In the third technique, we would have to provide full IA reasoning. This includes the closure of the IA graph by applying the (predefined) transitive rules on all intervals derived from an EAQ. Then, the derived intervals with the annotated objects from the IA graph are extracted, which hold according to the queried relations in $Q_{T_{l}}$ .

In [36], we introduced two spatial query evaluation strategies assuming two restrictions:

no bounded variables occur in spatial atoms, and

the CQ is acyclic.

Queries with bound variables in spatial atoms are unlikely to occur, since we introduced a separation between spatial objects and their spatial extension, where the former is shown in the results and the later is used for filtering. Acyclicity, roughly speaking, is given if a CQ has no proper cycle between join variables.

The main strategy is based on the decomposition of the query hypergraph and the derived join plan, is well-suited for implementing spatial-stream CQs, as it gives us fine-grained caching, full control over the evaluation, and possibly the handling of different DB entities. Details are given in the standard DB literature such as [58].

The main steps of our query evaluation strategy are as follows. First, we construct the acyclic hypergraph $H_{q}$ from q and label each hyperedge in $H_{q}$ with

$l_{O}$ for an ontology edge,

$l_{S}$ for a spatial edge, and

$l_{F}$ for a stream edge with the window size added as well.

Then, we build the join tree

J_{q}

H_{q}

and extract the subtrees

J_{ϕ_{i}}

H_{q}

, such that each node is covered by the same labels. Thus, we have sub-CQs that share the same aggregation/prediction functions and the same window size l. For each subtree

J_{ϕ_{i}}

, we perform the detemporalization of the stream CQ

q_{ϕ_{i}}

by extracting and computing the results, which are stored in a virtual relation (a temporary table)

R_{ϕ_{i}}

. Finally, we traverse

J_{q}

bottom up, left-to-right, to evaluate

q_{ϕ_{i}}

for each subtree

J_{ϕ_{i}}

(without stream atoms) and cache the results in memory for future queries.

Example 5.3.
The following example is a simplified version of $q_{1}$ , where the layers distinguish between ontology (first, second), stream/temporal (third, fourth), and spatial (fifth line) atoms: $\begin{array}{l} q_{2} (x, y) : & LaneIn (x) \land isManagedBy (x, z) \\ \land SignalGroup (z) \land Vehicle (y) \\ \land pos (y, v) [line, 10 s] \land during (v, s) \\ \land hasState (z, s) [last, 5 s, - 5 s] \\ \land (s =^{'} {Stop}^{'}) \\ \land hasLoc (x, u) \land intersects (u, v) \end{array}$ Based on the hypergraph and its decomposition shown in Fig. 3, we have the following evaluation order:
$q_{2_{F 1}} (y, v @ i_{v}) : Vehicle (y) \land pos (y, v @ i_{v}) [line, 10 s]$ ;

$q_{2_{N 1}} (x, u) : LaneIn (x) \land hasLoc (x, u)$ ;

$q_{2_{F 2}} (x, s @ i_{s}) : LaneIn (x) \land isManagedBy (x, z) \land SignalGroup (z) \land hasState (z, s @ i_{s}) [last, 5 s, - 5 s] \land (s =^{'} {Stop}^{'})$ ;
Fig. 3.
Hypergraph of $q_{2}$ (hyper-edge for $intersects$ is simplified).

$q_{2_{T 1}} (y, v) : q_{2_{F 1}} (y, v @ i_{v}) \land during (v @ i_{v}, s @ i_{s}) \land q_{2_{F 2}} (x, s @ i_{s})$ ;

$q_{2} (x, y) : q_{2_{T 1}} (y, v) \land intersects (u, v) \land q_{2_{N 1}} (x, u)$ .

5.6. Complexity of query answering

In this section, we address the computational cost of our spatial-stream QA approach, where we refer to the following complexity classes: $\begin{array}{l} {AC}^{0} & ⊊ LOGSPACE \subseteq NLOGSPACE \\ \subseteq P \subseteq NP; \end{array}$ their definitions can be found in any standard textbook (e.g., [65]). As already stated, we are mainly interested in the reasoning service of QA with databases and ${DL-Lite}_{A}$ . For analyzing the computational complexity of QA, we distinguish between data, query, and combined complexity, which is the complexity with respect to the size of the database (i.e., the ABox), the size of the query, and the combined size of both, respectively [79]. For our work, the smallest class of relevance is $A C^{0}$ , which is captured by the family of polynomial sized, constant depth, arbitrary Boolean fan-in circuits. A central result of complexity theory is that the evaluation of FO-queries (of which CQs are a fragment) is complete for (logtime uniform) $A C^{0}$ with respect to data complexity.

DL-Lite The DL-Lite family of languages such as DL-Lite_R and ${DL-Lite}_{A}$ , extends QA with the idea of rewriting the ontology $O$ into the CQ q resulting in a union of CQs $q^{'}$ that is evaluated over a DB (called ABox) $A$ . This is based on the idea that computing an answer for the rewritten query $ans (q^{'}, A)$ is the same as computing the answer over the full KB $ans (q, (O, A))$ ; this property is called FO-rewritability. Note that in the DL-Lite family, a QA technique needs to handle existentially quantified roles in the head of inclusion assertions, which can be done in different ways.

The languages above have combined complexity in $NP$ and data complexity in $LOGSPACE$ in [26]; hence, the evaluation of the rewritten CQ q can be delegated to an RDBMS. The only drawback is that $q^{'}$ can possibly be exponentially larger than q (and thus it takes exponential time to construct $q^{'}$ in the worst case)

The motivation for our work was to extend ${DL-Lite}_{A}$ with spatial and stream capabilites by keeping FO-rewritability and the $LOGSPACE$ data complexity. This allows us to rewrite spatial-stream CQ and evaluate them over a Data Stream Management System. In the following, we introduce the already covered extensions and highlight the restrictions, which we imposed on the extended query language, so we stay within $LOGSPACE$ data complexity.

Spatial extensions Dealing with spatial queries, three aspects have to be considered regarding data complexity. First, we restrict the input to a finite (active) domain. This restriction is due to the structure of Boolean fan-in circuits, where each relation is encoded by input nodes concerning all domain elements. Finiteness is guaranteed, if we encode geometries as sets of points, which we call admissible geometries. The binary (spatial) relation between two admissible geometries can be defined according to pure set operations of these points, e.g., the $contains$ relation is defined by the subset operation. Second, spatial atoms of the form $Q_{S} (z, z^{'})$ need to be rewritten into unions of CQs of the form $⋁_{s, s^{'} \in Γ_{S}} (C_{s} (z) \land C_{s^{'}} (z^{'}) \land S (s, s^{'}))$ , where $C_{s}$ and $C_{s}^{'}$ are fresh spatial concepts for the geometries s and $s^{'}$ , and the spatial relation $S (s, s^{'})$ is calculated according to set operations mentioned. Rewriting the spatial relations into the query generates large unions of CQs, but FO-rewritability is retained. However, a query blowup may be avoided by rewriting the query to a Datalog program (see query rewriting techniques in Section 5.8). Third, an interaction with the existentially quantified parts of the ontology is not given, since we introduce fresh concepts that are not connected to the rest of the ontology, where existentially quantified roles might occur (more details are given in [35]).

Stream relations As is with spatial data, the first concern regards the finiteness of the domain, since our timeline $T$ might be infinite. However, by applying the “timestamp” function $v^{*}$ (as shown in the previous section), we limit the possible time points by the upper/lower bound of the windows sizes. As discussed in Section 5.3, aggregates can not directly be lifted to ${DL-Lite}_{A}$ , since groups, namely bags of values, lead to empty answers having the certain answer semantics of ${DL-Lite}_{A}$ . A suitable technique was developed by Kostylev and Reutter [53], who defined a range semantics for aggregate queries that is based on intervals of aggregated values for each group over all models. Their approach has the drawbacks that only $COUNT$ queries are feasible and that the data complexity increases to $coNP - completeness$ for Boolean queries, where a tuple and a (counting) number is checked whether it is in the certain answers. An alternative semantics based on epistemic aggregate queries is in our case favourable as the data complexity is lower, and besides $COUNT$ also other aggregation function such as $SUM$ are usable. However, this approach has the drawback that all known values have to be collected before the query is answered (as performed in line 10 of Algorithm 1); this is a costly operation, which can be partially mitigated by caching the static values and tracking the added/deleted value in a particular window. An extensive evaluation of mitigation is beyond the scope of this article but considered for future work.

Temporal relations Earlier in this section, we introduced two possible strategies for rewritting temporal IA relations of the form $Q_{T} (z, z^{'})$ : (a) IA filtering, where each basic relation is treated as single binary constraint, and (b) IA reasoning, which includes the inference of possibly $2^{13}$ general relations from the 13 basic relations. The IA reasoning based strategy is not feasible, as we aim to retain $LOGSPACE$ data complexity, since already checking Satisfiability of a given IA network is $NP-complete$ [7]. The IA filtering based strategy does not affect FO-rewritability, since every temporal relations acts as a single binary constraint that fullfils the filter critera as shown in Table (13). Furthermore, the filtering is applied only to named individuals, due the nature of EAQs.

Query structure and decompositions A large body of works have been dedicated to connecting hypergraphs, (acyclic) DB schemes, and join trees, an overview is given in [30,58]. For decomposing a query q, the query hypergraph $H (q) = (V, E)$ represents the variables in q by vertices V and the atoms in q with shared variables by hyperedges in E. In case of an acyclic conjunctive query (ACQ), which is defined in terms of the acyclicity of $H (q)$ , a join tree can be generated from $H (q)$ that yields a plan for computing the query q. Gottlob et al. show that the relation between ACQs, hypergraphs, and join trees strengthening previous results by showing that answering Boolean ACQ has a combined complexity of $LOGCFL-completeness$ [41,42]. We point out that the construction of the join tree and the derived evaluation order is a preprocessing step, therefore the computational complexity is of lower importance.

Note that there are different notions of acyclicity, i.e., α, β, and γ-acyclicity; we focus in this work on α-acyclicity, which can be efficiently tested by the GYO-reduction (cf. [30,58]). We say that a CQ is acyclic, if its hypergraph is α-acyclic. A specific join tree $J_{H}$ can be found via the maximum-weight spanning tree $T_{S}$ of the intersection graph $I_{H}$ of H, where edge weights of $T_{S}$ are edge counts of V in $I_{H}$ (cf. [58]).

5.7. Aggregations and predictions

In this section, we give more details on the aggregation/prediction functionalities, where normal objects, spatial objects, and constant values need different types of aggregate functions.

For normal objects and constant values, we allow the aggregate functions $count$ , $first$ , and $last$ on the streamed data items. For $last$ and $first$ , we need to search the bag of data items, as the sequence of time is lost in our representation. This is achieved by iteratively checking if we have a match at one of the points in time. In the implementation, the first and last match can be simply cached while processing the stream. For individuals and constant values that are numerical, we allow a wider range of aggregate and prediction functions on the streamed data items:

Order of items: $first$ , $last$ , where they give the first or last element in the stream, respectively;

Simple aggregations: $count$ , min, $\max$ , $sum$ , and $avg$ ;

Descriptive statistics (DS): $mean$ , $sd$ , $var$ , and $median$ are standard statistical functions that calculate the mean, standard deviation, variance, and median as expected;

Predictions: we apply predefined regression methods to predict values from existing (time-series) data items inside a window. Model building (i.e., the training) and prediction should be fast, hence we support the following lightweight methods:

(a) $lin_reg$ calculates the log-linear regression model;

(b) $mov_avg$ calculates the moving average of the past values;

(d) $grad_boost$ uses gradient boosting with regression trees.

Since the order of items is lost due to the bag semantics, the temporal annotations (e.g.,

speed (c_{1}, 50) @ 10

) are needed in the prediction functions as the second dimension. We allow different regression methods with increasing complex models. On small windows with a required fast response time,

mov_avg

and

\exp_smooth

is preferable, while on larger windows, e.g., for traffic predictions,

grad_boost

could be applied.

For spatial objects, geometric aggregate functions are applied to the bag of data items $p_{b}$ that represent geometries. As with $first$ or $last$ , we must rearrange them to create a valid geometry, i.e., a sequence $p_{o} = (p_{1}, \dots, p_{n})$ of points. We allow these functions to derive new geometries (among others):

$point$ : we evaluate the function $last$ to get the last data item $p_{n}$ of the sequence $p_{o}$ ;

$line$ : we create a sequence of points $p_{o}$ representing a path by calculating a total order on the bag of points $p_{b}$ , such that we have a starting point using $last$ and iterate backwards finding the next point by Euclidean distance;

$line_angle$ : the angle (in degrees) of $line$ regarding a reference system is calculated by

applying the function $line$ ,

obtaining a simplified geometry using smoothing, and

calculating the angles between the lines of the simplified geometry;

$polygon$ : similar to $line$ , but we create a polygon $(p_{1}, \dots, p_{n})$ , where the start- and endpoints are the same, i.e., $p_{1} = p_{n}$ , by (1) determining the convex hull of the bag of points, and (2) extracting all pairs of points representing the convex hull;

$traject_line$ and $traject_heading$ are simple techniques to project possible trajectories from past points. The former is linearly projecting the trajectory based on the previous points and the current speed. The latter calculates the trajectory based on the last point and the last heading of the vehicle.

For the trajectory computation, besides a simple linear, also a curvature-based model could be applied. To improve the accuracy of the model, we could use the speed of the last data points, so a speed-up or slow down would be taken into account. Since this extension needs further investigation, it will be considered for future work.

5.8. Optimization techniques

We outline three suitable optimization techniques for the query processing, namely by (a) optimizing the query rewriting itself with the aim of generating small rewritings and keeping the number of generated CQs low; (b) the caching of static sub-queries for faster re-evaluation of the full query on recurring evaluations; and (c) the parallelization of the stream sub-queries for improving the performance for streamed, no-cachable data. A fourth optimization techniques that focuses on improving the query decomposition with the aim of finding larger sub-queries for reducing in-memory joins is out of the scope of this work and needs further investigation.

Query rewriting Optimizations for DL-Lite query rewriting is a well studied topic. As recognized in [66], the original PerfectRef algorithm suffers from producing large rewrittings, since an exhaustive factorization to guarantee completeness, leads to unnecessarily large unions of CQ. Significant improvements regarding PerfectRef were presented with REQUIEM [66], where the factorization was replaced with functional terms, and in Presto [75] were unnecessary existential joins are eliminated by computing the most-general subsumees and producing a non-recursive Datalog program instead of an union of CQ as the rewriting. Finally, the most recent optimization technique, called tree-witness-based rewriting [49], which boils down to introducing fresh constant symbols (nulls) for non-ABox elements of the canonical model to allow for matches of the query in the extended (canonical) model. The tree-witness-based rewriting was implemented in [74], where also additional optimization techniques such as pushing joins into unions, eliminating sub-queries, and self-join elimination were added. Our approach is based on the PerfectRef algorithm, which is implemented with minor optimizations in the query rewriter of Owlgres 0.1 [77]. A more efficient implementation based on tree-witness-based rewriting is planned for future work. For possible optimizations in the query rewriting, we define the following three parameters:

structure of the ontology;

structure of the query; and

number of existentally-quantified query atoms.

Caching For RDBMS, several approaches for caching were developed over the years. Traditionally, caching between client- and server systems (with a DB present) was page-based (i.e., groups of related tuples), where full pages are cached on the client and requested from the server if missing. Tuple-based caching introduces a smaller data granularity, where individual tuples are cached on the client. Both caching approaches can implement different cache replacement policies such as First in First Out (FIFO), Least Recently Used (LRU), or Most Recently Used (MRU) [58]. More sophisticated caching approaches, called “semantic caching”, introduce “semantic spaces” used for caching, which allow a segmentation of the domain space of the attributes of a (DB) relation [71].

Fig. 4.

System architecture.

Our approach for caching the sub-queries relates to “semantic caching”. However, the segmentation of the attributes/tuples is given (a) by the separation between static and stream query atoms, and (b) by the structure of the TBox, i.e., the concept/role hierarchies as well as the domain range of roles of a query atom. Note that our segmentation does not consider spatial data, which also could be used for caching using the Euclidean distance for segmentation. For a cache-based optimization, we define the following parameters:

caching strategies as mentioned above;

cache size and location, i.e., in-memory caching or integrated caching as used by PostgreSQL;

cache invalidation based on the stream update frequency;

number and size of static/spatial sub-queries; and

ratio between static/spatial and streamed sub-queries.

Query parallelization Similar to caching techniques, parallel query execution is a standard RDBMS query evaluation technique. We focus in our work on intra-query parallelism and do not cover inter-query parallelism, since we restrict ourselves to execute a single query at the time for evaluating our approach. Intra-query parallelism can be broken down to inter-operator and intra-operator parallelism, where we omit intra-operator executions, which would allow for the execution of multiple processes of a single operator such as merge-join [43]. Inter-operator execution of different operators is, besides caching, the central optimization method for our evalution of queries. Inter-operator execution can be implemented by vertical, i.e., pipeline-based execution of the operators, or by horizontal (also called bushy) execution. The horizontal, inter-operator execution fits well to our hypertree-based approach, since it can be used for a horizontal separation into sub-queries if there are no dependencies, i.e., hyper-edges, between them. Note that a sub-query in our case is always a set of operators, hence we separate sets and not single operators. Due to the labelling of hyper-edges by type (i.e., streamed, static, or spatial), we can identify the sub-queries that contain stream atoms, and execute them in-parallel. Upon completion, the results can be propagated along the join tree, as long as the other branches of the join tree are already executed. For an optimization by parallelization, we define the following parameters:

number and size of stream sub-queries;

ratio between static/spatial and streamed sub-queries; and

shape of join tree, i.e., the depth or the average or maximum branching factor of a join tree.

6. Implementation

We have implemented a prototype of our spatial-stream OQA approach in Java 1.8 using the stream RDBMS PipelineDB 9.8.1.6

⁶
https://www.pipelinedb.com/

The system architecture is shown in Fig. 4. We chose PipelineDB, as it is built on top of PostgreSQL7

⁷

https://www.postgresql.org/

and PostGIS8

⁸

http://postgis.net/

and thus supporting stream and spatial data. It distinguishes between streams and continuous views, where streams are write-only, so the query evaluator has to access the read-only continuous views. We created an 1-to-1 mapping from streams to continuous views, and further to the TBox concepts and roles. For instance, vehicle positions are fed into the stream stream_pos(

id

pos

tp

), where

id

is the vehicle id,

pos

its position, and

tp

the time point of adding; stream_pos is accessed via the continuous view view_pos, which is mapped to the property

pos

. We also provide an integration framework that constantly receives V2X messages and adds raw message data either to normal tables of the static DB, spatial tables of the GIS DB, or the streams of the stream DB.

6.1. Implementation details

The query parser and decomposer component is used for parsing the input spatial-stream CQ, and then decomposing the query hypertree using Gottlob et al.’s [32] implementation.9

⁹
https://www.dbai.tuwien.ac.at/proj/hypertree/

Depending on the size of the CQ, the decomposition can be expensive, hence it is performed as a preprocessing step, whereas the decompositions as introduced in Section 5.5 are cached in-memory. The decomposer gives us the join tree

J_{q}

and the sub-CQs assigned to each tree node. For each node, we also keep the label that includes the sub-query type, window size, and aggregation/prediction function. The query evaluator traverses

J_{q}

bottom up, left-to-right, and (1) checks if the results of a sub-CQ are already cached; (2) if not, it instantiates one of the evaluator according to the sub-CQ type.

The standard query evaluator performs the DL-Lite_A query rewriting using Owlgres 0.1 [77], which is a prototypal implementation of the PerfectRef algorithm. More efficient implementation as in Ontop [74] are planned to replace the Owlgres-based rewriter.

The stream aggregator detemporalizes for each stream sub-CQ $q_{i}$ the streams by grouping/aggregating the data items and performing the following steps:

extracting the data items according to the defined window size;

evaluating $q_{i}$ without rewriting and storing the “known solutions” in memory as $R_{i, 1}$ ;

evaluating $q_{i}^{'}$ with rewriting over $R_{i, 1}$ and storing them in memory as $R_{i, 2}$ ;

applying the prediction function on $R_{i, 2}$ and adding the predicted data items;

applying the grouping/aggregation function on $R_{i, 2}$ and producing the outcome $R_{i, 3}$ .

The stream predictor is integrated in the stream aggregator, wherein predictions are applied on the collected data items after they are grouped. We provide standard implementations for the functions $mov_avg$ and $\exp_smooth$ . For $grad_boost$ , we use the state-of-art library XGBoost.10

¹⁰

https://xgboost.readthedocs.io/en/latest/

The spatial evaluator evaluates the different spatial relations based on the grouped/aggregated data items. For performance reasons, we do not compile the spatial relation to SQL, but evaluate them in-memory using the functions of the JTS Topology Suite.11

¹¹

https://github.com/locationtech/jts

This is feasible as we have a clear separation in the join tree

J_{q}

by sub-query type.

The temporal evaluator supports the mentioned IA filtering technique, since temporal relations can be directly rewritten into SQL by encoding the relations as joins, where each relation is encoded as a filter on the start/end points of the aggregated data items. The second technique, IA reasoning, is planned for future work.

Note that we designed the prediction function as an integrated part of the stream evaluator. However, predictions could also be treated as external data streams, generating data items continuously. This would be an appealing extension, but would change the query evaluation process and needs further investigation. The same considerations would apply to the trajectory predictions. The aggregate function $traject$ is designed with the same intention; we take the existing points (as coordinates) and project a single path into the future. Currently, we apply a simple straight-line projection to create new points. However, taking the curvature into account is a desired extension.

Optimizations In Section 5.8, we outlined suitable optimization techniques, where the techniques regarding query rewriting come at no cost, since it is part of the rewriter component. For instance, the query rewriter of Owlgres 0.1 only rewrites the concepts/roles of the ontology $O$ that have assertions in the ABox. Using components of Ontop would lead to smaller unions of CQ and faster evaluation times, since tree-witness-based rewriting was implemented therein [74].

We have implemented an initial caching technique, which is based on the separation of static/spatial and streamed sub-queries. Our implementation applies in-memory storage and mixes a LRU policy for the static sub-queries and a FIFO policy for streamed sub-queries, where data streams naturally capture a FIFO policy.

As already mentioned before, our approach favours a vertical and inter-operator parallelism, since our hypertree-based query decomposition results in a join tree, which allows us to extract stream sub-queries that are independent of each other. After the query is parsed, the query evaluator collects all independent stream sub-queries, execute them in-parallel on PipelineDB, and collects all the results of these queries. After the last stream sub-query is finished, the query evaluator continues to evaluate further operators by traversing the join tree to the top.

7. Evaluation

We evaluated our platform regarding the requirements/features (cf. Table 1) derived from the use cases. The requirements are encoded into a set of queries that include the introduces features of Section 3.2. The lightweight LDM ontology, queries, the experimental setup, log files, and collected results, as well as the implementation are available on the evaluation website.12

¹²
http://www.kr.tuwien.ac.at/research/projects/loctrafflog/ekaw2018

Fig. 5.

Four intersection scenario.

7.1. Scenario data

For having realistic traffic data, we generated our streaming data with the microscopic traffic simulation tool PTV Vissim,13

¹³
http://vision-traffic.ptvgroup.com/en-us/products/ptv-vissim/

which allows us to simulate realistic driving and traffic light behavior, as well as the possibility to create unexpected events like accidents. We extract the actual state of each Vissim simulation step, and store the result as JSON in log files. A log file player is provided to replay the different simulations by feeding the log data to PipelineDB. For varying the data throughput, we adjusted the following parameters: (a) replayed with 5 ms, 10 ms, 50 ms, 100 ms delay, where 5 ms are the fastest updates (i.e., simulating sensors) and 100 ms is the real-time speed of the Vissim simulation; (b) we simulated light, medium, and heavy traffic in our scenario, where we have approx. 20, 50, and 150 vehicles, respectively, simultaneously on the road network. We modeled a real-world scenario shown in Fig. 5, which is based on a grid layout with four intersections of four roads crossing, and two incoming and outgoing lanes per street. The two incoming lanes of each side have traffic light controllers assigned; all maneuvers (turn left/right, straight on) to outgoing lanes are allowed. The main traffic flow is from north to south and west to east. We encode the structure of the full intersections into static ABox instances as follows:

intersections, roads, lanes, signal groups, and vehicles as concept assertions;

geometries for each lane, road, etc. as attribute assertions; and

lane connectivity, signal group assignments, etc. as role assertions.

7.2. Queries for experiments

Based on the requirements, we derived a set of queries to assess each scenario, where each query aims at answering a specific problem of the use case taking the set of features into account. We use a more compact representation, where the commas between atoms are conjunctions and disjunctions are explicitly declared using the $or$ statement.

For the use case S1.1 (object statistics), query $q_{1.1}$ determines the average and maximum speed of BMWs and VWs in the last 10 secs. $\begin{matrix} q_{1.1} (x, u, v) : & Vehicle (x), vehicleMaker (x, z), \\ speed (x, u) [avg, 10 s], \\ speed (x, v) [\max, 10 s], \\ (z =^{'} {BMW}^{'} or z =^{'} {VW}^{'}) \end{matrix}$ For the use case S1.2 (intersection statistics), we count vehicles according to their engine type. Sub-queries $q_{1.2 a}$ and $q_{1.2 b}$ select cars with either diesel or petrol engine that pass intersection $i 100$ . Query $q_{1.2}$ aggregates the sub-queries and returns the count of diesel in y and petrol vehicles in z, respectively: $\begin{matrix} q_{1.2 a} (x, y) : & Vehicle (y), pos (y, z) [line, 10 s], \\ vehicleEngine (y, m), \\ (m =^{'} {Petrol}^{'}), intersects (z, u), \\ hasLoc (x, u), \\ Intersection (x), x =^{'} i 100^{'} \\ q_{1.2 b} (x, y) : & Vehicle (y), pos (y, z) [line, 10 s], \\ vehicleEngine (y, m), \\ (m =^{'} {Diesel}^{'}), intersects (z, u), \\ hasLoc (x, u), \\ Intersection (x), x =^{'} i 100^{'} \\ q_{1.2} (x, y, z) : & q_{1.2 a} (x, y) [count, 10 s], \\ q_{1.2 b} (x, z) [count, 10 s] \end{matrix}$ For the use case S1.3 (network statistics), we have two linked intersections $i 100$ and $i 200$ . Query $q_{1.3}$ traces the vehicles that start at $i 100$ and counts those passing through $i 200$ . A delay allows us to check the vehicle’s position 7 s later, and the temporal relation $before$ ensures that a vehicle first passes $i 100$ and then $i 200$ . $\begin{matrix} q_{1.3 a} (x, v) : & Vehicle (x), pos (x, v) [line, 6 s], \\ intersects (v, u), Intersection (r), \\ hasLoc (r, u), (r =^{'} i 100^{'}) \\ delay (7 s) \\ q_{1.3 b} (x, z) : & Vehicle (x), pos (x, z) [line, 6 s], \\ intersects (z, w), Intersection (r), \\ hasLoc (r, w), (r =^{'} i 200^{'}) \\ q_{1.3 c} (x) : & q_{1.3 a} (x, v), before (v, z), q_{1.3 b} (x, z) \end{matrix}$

For the use case S2.1 (simple maneuvers), query $q_{2.1}$ returns all vehicles x that turned left or right in the last 6 s, where the function $match$ is an extension of the aggregate function $line_angle$ that incorporates a filter with a predefined interval on the results of $line_angle$ , e.g., they have to be between $- 175$ and $- 15$ degrees heading. As a result, both queries are combined by an union resulting in the count of all vehicles performing the two maneuvers. $\begin{matrix} q_{2.1 l} (x) : & Vehicle (x), pos (x, y) [line, 6 s], \\ Intersection (r), intersects (y, u), \\ hasLoc (r, u), (r =^{'} i 100^{'}), \\ match (y) [angle, - 175, - 15] \\ q_{2.1 r} (x) : & Vehicle (x), pos (x, y) [line, 6 s], \\ Intersection (r), intersects (y, u), \\ hasLoc (r, u), (r =^{'} i 100^{'}), \\ match (y) [angle, 15, 175] \\ q_{2.1} (x) : & q_{2.1 l} (x) or q_{2.1 r} (x) \end{matrix}$ In use case S2.2 (complex maneuvers), query $q_{2.2}$ detects illicit lane changes in terms of crossing the middle marker (i.e., a white line). This is detected by evaluating whether a vehicle passed for a certain period from an in-lane to an out-lane or vice versa. $\begin{matrix} q_{2.2} (x, y) : & LaneIn (z), hasLoc (z, u), \\ intersects (u, v), Vehicle (x), \\ pos (x, v) [line, 6 s, 3 s], \\ pos (x, w) [line, 3 s, 0 s], \\ intersects (t, w), \\ hasLoc (y, t), LaneOut (y) \end{matrix}$ For the use case S2.3 (red-light violation), we modified Ex. 1 by taking trajectory and speed prediction into account, which allows us a more precise detection of violations, since we can rule out vehicles that are slowing down or are about to change lanes. $\begin{matrix} q_{2.3} (x, y) : & LaneIn (x), hasLoc (x, u), \\ intersects (u, v), Vehicle (y), \\ pos (y, v) [traject_line, 5 s, - 3 s], \\ speed (y, r) [mov_avg, 5 s, - 3 s], \\ (r > 10), SignalGroup (z), \\ hasSignalGroup (x, z), \\ hasState (z, Stop) [last, 5, - 5] \end{matrix}$ For the use case S2.4 (vehicle breakdown), we check with $q_{2.4}$ , if a car has stopped for longer than 30 s, while (using the $during$ relation) it is located inside our intersections, but not on one of the park lanes (using the $disjoint$ relation). $\begin{matrix} q_{2.4} (x, y) : & Vehicle (x), speed (x, r) [avg, 30 s], \\ (r < 1), inside (v, u), \\ pos (x, v) [line, 15 s], hasLoc (y, u), \\ Intersection (y), during (v, r), \\ disjoint (v, z), hasLoc (p, z), \\ ParkLane (p) \end{matrix}$ The use case S2.5 (traffic congestion) can be evaluated by a query similar to S2.4, but with the extension that stop-and-go traffic can be excluded by checking if there is no movement while the traffic light phases are on “Go”, hence the stopping of the traffic is not caused by the traffic light. $\begin{matrix} q_{2.5} (x, y) : & Vehicle (x), speed (x, r) [avg, 30 s], \\ pos (x, v) [line, 30 s], (r < 1), \\ intersects (v, u), hasLoc (y, u), \\ LaneIn (y), hasSignalGroup (y, z), \\ hasState (z, s) [last, 30 s), \\ (s =^{'} {Go}^{'}), during (s, r), \\ SignalGroup (z) \end{matrix}$

For the use case S3.1 (self monitoring), we aim to detect with $q_{3.1}$ , if our ego vehicle defined by $isEgo (x)$ is exceeding a speed limit assigned to each lane and checking, if it is currently located on such a lane. $\begin{matrix} q_{3.1} (x, y) : & LaneIn (y), hasLocation (y, u), \\ intersects (u, v), pos (x, v) [line, 5 s], \\ Vehicle (x), speed (x, r) [\max, 5 s], \\ isEgo (x), speedLimit (y, s), (r > s) \end{matrix}$ In use case S3.2 (obstructed view), we compute query $q_{3.2}$ , where our prototype (as the ego vehicle) aims to detect vehicles that very likely will collide in 2 s on an intersection by checking whether their predicted trajectories will cross another vehicle’s trajectory. $\begin{matrix} q_{3.2} (x, y) : & Vehicle (y), isEgo (y), \\ pos (y, v) [traject_line, 2 s, - 1 s], \\ intersects (v, w), (r > 10), \\ speed (x, r) [mov_avg, 5 s, - 2 s], \\ Vehicle (x), \\ pos (x, w) [traject_line, 2 s, - 1 s] \end{matrix}$

In S3.3 (traffic rules), our ego vehicle approaches an uncontrolled intersection at the same time with other vehicles. According to (local) traffic regulations, preference (shown Fig. 6) is given to (1) the vehicle on the main road, (2) the one that is not changing lanes, and (3) the vehicle approaching from the right. Vehicle C has preference over A and B, where B would have preference over A, but the preference can not be given, since A is on a single-lane road. We can express the traffic regulations with the following Datalog rules: $\begin{matrix} giveWay (y, x) & \leftarrow & willCross (x, y) & (1) \\ \land straightOn (x) \\ \land turnRight (y) \\ \land onMainRoad (x) \\ \land onMainRoad (y) . \\ giveWay (y, x) & \leftarrow & willCross (x, y) & (2) \\ \land onMainRoad (x) \\ \land onTributRoad (y) . \\ giveWay (y, x) & \leftarrow & willCross (x, y) & (3) \\ \land giveWay (x, y) \\ \land onSingleLaneRoad (x) . \\ stop (y) & \leftarrow & vehicle (x) \land vehicle (y) & (4) \\ \land giveWay (y, x) . \end{matrix}$

Fig. 6.

Use case traffic rules.

The atom $willCross (x, y)$ matches all vehicles that might collide and can be evaluated by $q_{3.2} (x, y)$ (modified without $isEgo (x)$ ). The atoms $turnRight (x)$ and $straightOn (x)$ can be evaluated by the queries $q_{2.1 r} (x, y)$ assuming the queries are treated as atomic rules with $q (x, y)$ as the head. Atoms $onMainRoad (x)$ , $onTributRoad (y)$ , and $onSingleLaneRoad (x)$ can be evaluated in the spirit of $q_{1.2 a} (x, y)$ checking spatial containment. Then, the rules of S3.3 can be expressed as unions of CQs, but with the difficulties that the order of the query evaluation effects the completeness of the results, i.e., Rule (2) has to be evaluated before (3).

Table 2

Results (t in secs) for scenario with (l)ow, (m)edium, and (h)eavy traffic, where Y means results with caching and parallelization, and N means results non-optimizations; d (%) is the average of deviations between optimized/non-optimized runs

	$# Q$	$# A$	Opt	(l) with ms delay					(m) with ms delay					(h) with ms delay					Avg. of d (%)

				5	10	50	100	d (%)	5	10	50	100	d (%)	5	10	50	100	d (%)
$q_{1.1}$	$3 (2)$	42	N	1.35	1.18	0.95	0.86		1.45	1.30	0.99	0.88		1.46	1.35	1.14	0.99
$q_{1.1}$	$3 (2)$	42	Y	0.45	0.43	0.38	0.37	61.80	0.47	0.46	0.41	0.38	61.90	0.48	0.48	0.43	0.39	63.61	62.44
$q_{1.2}$	$6 (2)$	43	N	1.30	1.20	1.01	0.96		1.33	1.24	1.04	1.00		1.41	1.38	1.07	1.01
$q_{1.2}$	$6 (2)$	43	Y	0.47	0.45	0.41	0.40	61.02	0.49	0.48	0.43	0.39	61.03	0.52	0.53	0.46	0.42	60.03	60.69
$q_{1.3}$	$8 (2)$	44	N	1.44	1.35	1.15	1.08		1.47	1.37	1.23	1.09		1.45	1.44	1.30	1.20
$q_{1.3}$	$8 (2)$	44	Y	0.49	0.45	0.41	0.39	65.22	0.49	0.46	0.41	0.41	65.54	0.52	0.52	0.45	0.43	64.39	65.05
$q_{2.1}$	$6 (2)$	43	N	1.31	1.20	1.01	0.98		1.43	1.29	1.09	0.99		1.48	1.40	1.13	1.02
$q_{2.1}$	$6 (2)$	43	Y	0.50	0.47	0.41	0.39	60.57	0.52	0.50	0.44	0.40	61.03	0.57	0.54	0.46	0.43	60.01	60.54
$q_{2.2}$	$7 (2)$	45	N	1.36	1.26	1.05	1.00		1.47	1.29	1.08	1.03		1.51	1.43	1.13	1.06
$q_{2.2}$	$7 (2)$	45	Y	0.63	0.54	0.43	0.40	57.47	0.66	0.65	0.52	0.45	53.22	0.61	0.58	0.50	0.48	57.38	56.02
$q_{2.3}$	$7 (3)$	50	N	1.57	1.50	1.27	1.21		1.63	1.53	1.30	1.22		1.72	1.65	1.37	1.27
$q_{2.3}$	$7 (3)$	50	Y	0.66	0.63	0.56	0.54	56.81	0.69	0.65	0.57	0.55	56.56	0.67	0.66	0.59	0.55	58.67	57.35
$q_{2.4}$	$5 (2)$	46	N	1.24	1.21	0.98	0.92		1.28	1.24	1.06	0.97		1.28	1.29	1.13	0.99
$q_{2.4}$	$5 (2)$	46	Y	0.47	0.44	0.42	0.39	60.12	0.48	0.47	0.43	0.39	60.96	0.54	0.48	0.46	0.41	59.62	60.23
$q_{2.5}$	$7 (3)$	43	N	1.44	1.38	1.16	1.08		1.50	1.41	1.20	1.11		1.55	1.47	1.26	1.17
$q_{2.5}$	$7 (3)$	43	Y	0.69	0.68	0.59	0.56	50.02	0.74	0.68	0.62	0.56	50.08	0.74	0.74	0.66	0.59	49.78	49.96
$q_{3.1}$	$5 (2)$	43	N	1.85	1.72	1.40	1.32		1.89	1.79	1.48	1.35		2.06	2.04	1.57	1.38
$q_{3.1}$	$5 (2)$	43	Y	0.51	0.44	0.41	0.38	72.19	0.47	0.48	0.43	0.41	72.22	0.49	0.52	0.44	0.40	73.43	72.62
$q_{3.2}$	$5 (3)$	63	N	1.41	1.34	1.23	1.17		1.48	1.43	1.27	1.20		1.56	1.51	1.31	1.21
$q_{3.2}$	$5 (3)$	63	Y	0.62	0.62	0.57	0.55	54.10	0.66	0.62	0.59	0.54	55.15	0.67	0.66	0.60	0.57	55.11	54.79
$q_{3.3}$	$12 (5)$	43	N	3.02	2.80	2.42	2.39		3.26	2.98	2.58	2.38		3.36	3.20	2.66	2.44
$q_{3.3}$	$12 (5)$	43	Y	1.63	1.57	1.40	1.34	44.01	1.71	1.63	1.48	1.35	44.69	1.82	1.75	1.53	1.44	43.65	44.12
Avg. of d(%)				59.87	59.56	57.36	57.15	58.49	60.46	59.06	56.95	57.12	58.40	60.52	59.94	57.47	56.86	58.70	58.53

7.3. Results

We conducted our experiments on a Mac OS X 10.14.4 system with an Intel Core i7 2.9 GHz, 8 GB of RAM, and a 250 GB SSD. The average of 21 runs for query rewriting time and evaluation time was calculated. The results are shown in Table 2, where the average evaluation time (AET) t is measured in seconds for three traffic densities and four update delays in ms. The columns represent the number of sub-queries $# Q$ with stream queries in brackets, the size of rewritten atoms $# A$ , Opt denoting if optimizations are active or not, the AET, and d (%) is the average of deviations between the optimized and non-optimized runs for a single traffic density with four delays. The column, resp., row Avg. of d (%) holds the calculated average over the full rows, resp., columns.

The new experiments confirm results of [36] with closer to “real-world” queries and simulation data. The AET ranges between 0.86 s and 2.06 s with the exception of use case S3.3, which emulates rules using unions of CQs. Query $q_{3.1}$ shows the highest delay of 2.06 s, since the join condition of $(r > s)$ is evaluated inline and not on the DB, which adds a delay of $0.4 s$ with larger windows. Our baseline query is $q_{1.1}$ tested with 100 ms delay and low traffic. It has an AET of 0.86 s, where 0.23 s is the time-to-load (TOL), 0.63 s is needed for query evaluation of two stream atoms, where we artificially delaying the next stream atom evaluation by 150 ms. The artificial delay is empirically determined and needed for PipelineDB to set up the continuous views (CVs) on-the-fly; ignoring this would lead to missing results.

The added functions for statistics, matching, and predictions, i.e., $mov_avg$ and $traject_line$ do not affect performance, since they are applied on small windows with few data items. If we would apply the function $grad_boost$ for predictions on large windows for long-term traffic predictions, our query time could rise considerably, since prediction time (without a preprocessed training step) can be above 20 s.

Discussion on optimizations A second set of experiments highlights the effect of the optimizations based on caching and sub-query parallelization. The results are shown in the second row of each query in Table 2. The optimizations improve the orginal AET ranging between 0.49 s and 1.05 s, to the new results with a minimal AET of 0.37 s (in $q_{1.1}$ with 100 ms delay on low traffic density), and with the worst result of a maximal AET of 0.74 s (in query $q_{2.5}$ with 5 ms delay and high traffic density). Note that we do not take query $q_{3.3}$ into account, since it encodes rules and is discussed below.

The performance gain in AET is shown in percentage terms for each traffic density in the columns 9, 14, 19, and 20, which includes the calculated average of the other three columns. We observe that the gain for each query is almost uniform over the traffic densities, and the variation is between the queries (shown in column 20), and smaller between the four delays (shown in row 23). Regarding the results in column 20, the queries $q_{1.3}$ and $q_{3.1}$ show the highest, and the queries $q_{2.5}$ and $q_{3.3}$ the lowest gains. The improvement in $q_{1.3}$ is related to the effect that the same query is executed twice with a $7 s$ delay, hence the caching has a strong impact. The improvement of $q_{3.1}$ is mainly related to the fact that a single ego-vehicle is observed, and in the case that the vehicle is not present no result are calculated. The (undesirable) lowest gain in query $q_{2.5}$ is related to an implementation detail of the sub-query parallelization, which concurrently executes only pairs of sub-queries; hence the third stream query is run after the initial two stream queries. The average performance gain over all queries is $58.53 %$ , which is encouraging, however more optimizations are feasible. Finally, we observed that the optimizations mitigate two negative effects on the overall performance:

the rewriting of the ontology of sub-queries into views is only done in the initial run, since afterwards the results are taken from the cache;

the artificial delay of 150 ms for evaluating consecutive stream sub-queries is dropped, since this evaluation is newly conducted in parallel.

Note that query $q_{3.3}$ is a special case and shows the least performance gain, since four rules are evaluated sequentially as queries starting with query $q_{2.1}$ twice, $q_{3.2}$ , and finally merging the results of the queries at the end. This has the effect that the caching is local, performed for each query separately, and not global.

7.4. Feature coverage

As shown with the queries, we covered in the implementation all initial levels (L1) of features that are defined in the scenarios/uses cases. We support temporal relations based on a (partial) interval-based data model (F1) that are evaluated by pull-based queries (F2). Then, we allow temporal relations and nested queries that include unions of CQs (F3). However, we have not yet implemented the IA reasoning for temporal relations, since an in-memory evaluation of the transitive rules completing the IA graph needs further investigation. Regarding F4 and F5, we have implemented the initial set of numerical, descriptive statistical, and spatial aggregation functions. For F6, we covered $mov_avg$ and $\exp_smooth$ for fast, simple predictions, and support $grad_boost$ for long-term traffic forecasting. For trajectory prediction (F7), we have implemented a method based on a simple linear path calculation. However, more accurate trajectory predictions would be desired. Feature F8 is covered by the atom $match (y, z) [angle, 0, 15]$ , and F9 is partially covered by unions of CQs, but transitive rules are out of scope for this work.

7.5. Summary of expert evaluation

Overall, the experts confirmed that (a) the extension to time intervals and temporal relations, and (b) the inclusion of prediction capabilities are important extensions of the initial spatial-stream OQA approach. However, it turned out that the experts identified several limitations and interesting extensions.

The first limitation regards the LDM ontology, which should be aligned with the SSN ontology [46] to include patterns such as $Stimulus$ , $Sensor$ , and $Observation$ . Furthermore, it should capture a spatial model based 3D maps. The second limitation regards the unclear definition of the grammar of stream atoms, which should be clarified in this work. The third limitation is more generic and captures the assumption that rules would be better suited to capture the complex use cases, which is most apparent in the traffic rules use case.

The first extension regards Collective Perception Messages [70], which could be added as a new type of message used to share local sensor data. The second extension addresses Scenario 3, which could be extended with use cases that are taken from motion planing tasks in autonomous driving. The third extension identifies Kalman filters [80] and top-k aggregates [47] as more powerful aggregation functions. The fourth extension is likely the furthest reaching, since the ontology and query language (with the underlying semantics) could support uncertainty on the level of (a) data items and (b) of TBox assertions, e.g., inclusion assertions. The fifth extensions describes the handling of data items inside windows, where adaptably forgetting data items and extending their validity into future could be added to have a more flexible query language. The last extension is again more generic and captures the use of LDMs in an autonomous agent-based scenario, which would lead to the challenge of aligning different LDMs to a global dynamic map.

8. Related work and system comparison

First, we give a general overview on related work, and will discuss in-depth related systems in the sections afterwards, where we provide an qualitative and quantitive comparison with selected systems.

8.1. Overview

In a more formal setting, stream processing started with Data stream management systems (DSMSs) such as STREAM [11], Aurora [28], and TelegraphCQ [57]. Notably in STREAM, the authors introduced the Continuous Query Language (CQL) [11], which provides an explicit operational semantics based on stream-to-relation, relation-to-relation, and relation-to-stream operators. Furthermore, three kinds of window operators, namely partition-based, time-based, and tuple-based windows were applied in CQL.

With the need for more complex domain models, such as provided by background KBs, streams were lifted to a “semantic” level, leading to the development of RDF stream processing engines, such as Streaming SPARQL [20], C-SPARQL [14], SPARQLstream [24], and CQELS [55]. These systems proposed processing of RDF streams integrated with other Linked Data sources and background KBs. The C-SPARQL framework also has been extended to support deductive and inductive reasoning [13]. A system for efficient spatio-temporal query execution was proposed in [69], which supports spatial operators as well as aggregate functions over temporal features. EP-SPARQL [9] and LARS [17] propose languages that extend SPARQL respectively CQs with stream reasoning, but translate KBs into more expressive and less efficient logic programs. Regarding spatial RDF stream processing, a few SPARQL extensions were proposed, such as SPARQL-ST [67] and st-SPARQL [54].

In [12], ${DL-Lite}_{A}$ was extended with temporal operators such as used in LTL [68], introducing a semantics based on a two-sorted model separating the object from the temporal domain. Besides [12], similar temporal QA was investigated in [21] and [50], which are all on the theoretical side and for which no implementation has been yet provided. Finally, our framework was build on the results for EAQs provided by [27], but we investigated EAQs in a streaming setting and introduced more complex queries supporting also spatial and temporal relations.

Table 3
Qualitative comparison of F1–F9 on selected systems, where ^∗ indicates comments in the system descriptions; Ext, Pre, and Lmt means evaluation by using of external atoms, a preprocessing step needed, and with limited coverage

System F1 F2 F3 F4 F5 F6 F7 F8 F9

C-SPARQL Point Pull SPARQL + windows Yes Pre Pre Pre No RDFS

CQELS Point Push SPARQL + windows Yes Pre Pre Pre No RDF

INSTANS Point Push SPARQL + event patterns Yes Pre Pre Pre No RDFS + Rete

SPARQLstream / Morph-streams Point Pull SPARQL + windows Yes Pre Pre Pre No OWL2 QL

ONTOP (DatalogMTL) ^∗ Point + MTL Pull SPARQL + windows Yes Yes Yes Pre No DatalogMTL

ONTOP (STARQL) ^∗ Point Pull SPARQL + windows Yes Yes Yes Pre No OWL2 QL

TrOWL Point Pull CQ Lmt Pre Pre Pre No OWL2 EL or approx. OWL2 DL

Clingo (Multi-shot ASP) ^∗ Point Pull Rules + ext. atoms Yes Ext Ext Ext Ext ASP

ETALIS (EP-SPARQL) Interval + full AIA Push Rules or SPARQL + windows Yes Lmt Pre Pre No Prolog

RDFox Point Pull Rules Yes Pre Pre Pre No Datalog / OWL2 RL (incremental)

Laser (LARS) ^∗ Point + LTL Pull Rules + windows Yes Pre Pre Pre No Stratified LARS

Ticker (LARS) ^∗ Point + LTL Pull Rules + windows Yes Pre Pre Pre No LARS

Spatial-stream OQA Point/Interval with limited AIA Pull CQ + windows Yes Yes Yes Lmt Lmt OWL2 QL

System	F1	F2	F3	F4	F5	F6	F7	F8	F9
C-SPARQL	Point	Pull	SPARQL + windows	Yes	Pre	Pre	Pre	No	RDFS
CQELS	Point	Push	SPARQL + windows	Yes	Pre	Pre	Pre	No	RDF
INSTANS	Point	Push	SPARQL + event patterns	Yes	Pre	Pre	Pre	No	RDFS + Rete
SPARQLstream / Morph-streams	Point	Pull	SPARQL + windows	Yes	Pre	Pre	Pre	No	OWL2 QL
ONTOP (DatalogMTL) ^∗	Point + MTL	Pull	SPARQL + windows	Yes	Yes	Yes	Pre	No	DatalogMTL
ONTOP (STARQL) ^∗	Point	Pull	SPARQL + windows	Yes	Yes	Yes	Pre	No	OWL2 QL
TrOWL	Point	Pull	CQ	Lmt	Pre	Pre	Pre	No	OWL2 EL or approx. OWL2 DL
Clingo (Multi-shot ASP) ^∗	Point	Pull	Rules + ext. atoms	Yes	Ext	Ext	Ext	Ext	ASP
ETALIS (EP-SPARQL)	Interval + full AIA	Push	Rules or SPARQL + windows	Yes	Lmt	Pre	Pre	No	Prolog
RDFox	Point	Pull	Rules	Yes	Pre	Pre	Pre	No	Datalog / OWL2 RL (incremental)
Laser (LARS) ^∗	Point + LTL	Pull	Rules + windows	Yes	Pre	Pre	Pre	No	Stratified LARS
Ticker (LARS) ^∗	Point + LTL	Pull	Rules + windows	Yes	Pre	Pre	Pre	No	LARS
Spatial-stream OQA	Point/Interval with limited AIA	Pull	CQ + windows	Yes	Yes	Yes	Lmt	Lmt	OWL2 QL

8.2. Comparison with existing systems

There is a wide range of systems for stream processing, stream reasoning, and event detection available. For a comparison with our approach, we focus on systems that fulfill the following criteria: (1) ability to handle streaming and/or temporal data, either by having a window operator, or supporting incremental updates; (2) ability to provide reasoning, either based on ontology-, rule-, or query rewriting-based methods; and (3) a maintained implementation of the system has to be available.

As commented in Section 3, we will not re-evaluate the eight “standard” requirements of [78], and the three entailment levels for stream reasoning systems of [31]. Still, this work overlaps on some features with the comprehensive study of Margara et al. [59], where the authors compare systems according to the criteria:

continuous queries (F2/F3),

background data (F9),

time model (F1),

reasoning (F9), and

temporal operators (F1).

The other criteria in [59], namely data transformation, uncertainty management, historical data, quality of service, and parallel/distributed processing are not investigated in the context of this work.

The comparison in Table 3 shows the evaluation of the selected systems on the generic features F1 (Time model), F2 (Process paradigm), F3 (Query features), and F9 (Advanced reasoning), as well as the ITS domain specific features F4 (Numerical aggregations), F5 (Spatial aggregations), F6 (Numerical predictions), F7 (Trajectory predictions), and F8 (Spatial matching). We separate the table into two groups, where the first group represents query-based systems, and the second group rule-based systems. In the following, we give details on the systems that are compared.

8.2.1. C-SPARQL

C-SPARQL was introduced by Della Valle et al. [14] and was one of the first contributions in the area of stream processing with reasoning extensions. The main intention of this work lies in bridging the gap between the world of stream processing systems, i.e., stream RDBMS, and the Semantic Web taking RDF and SPARQL into account. C-SPARQL includes (a) a language for continuous queries over streams of RDF data, (b) an evaluation engine for this language, whereby C-SPARQL has the distinguishing features of (i) supporting timestamped RDF triples, (ii) supporting continuous queries over streams, and (iii) defining ad-hoc, explicit operators for data aggregation. The results of C-SPARQL queries are continuously updated as new data items appear on the stream, hence an efficient evaluation of sliding windows is possible.

8.2.2. CQELS

Similar to C-SPARQL, CQELS [55] also offers a query language and processing engine for answering queries over a combination of static and stream RDF data. While C-SPARQL adopts a “black box” approach, i.e. static and stream query elements are first divided and sent to the respective underlying stream and RDF query engines, CQELS, on the other hand, takes a “white box” approach by natively implementing the required query operators for the RDF-based data model, both for streams and static data. This native approach enables better performance and can dynamically adapt to changes in the input data. Another difference to C-SPARQL is that CQELS takes an eager query execution strategy: input data is processed as soon as it arrives in the system, contrary to the periodic evaluation of C-SPARQL which is triggered periodically, regardless of the input data throughput.

8.2.3. ETALIS with EP-SPARQL

The ETALIS system [9,10] was applied to the ITS domain and offers the combination of Datalog-style rules with a background KB. In ETALIS a Prolog-based language is used to express complex event patterns with predicates like $during (event 1, 5)$ or $begins (event 1, event 2)$ . The background KB is also encoded into the rule language, which in combination with the temporal and causal parts can be used for query answering. The standard Prolog query evaluation, which is based on a request-response paradigm, is altered to an event-driven backward chaining (EDBC) of rules. A standard Prolog system is then triggered to evaluate a query and additional EDBC rules when new data is arriving at the system. EP-SPARQL [9] is an approach that extends ETALIS and introduces windows and the handling of RDF streams for lifting the rule-engine to a Semantic Web/Linked Data settings.

8.2.4. INSTANS

INSTANS [73] is an event processing engine based on handling multiple interconnected SPARQL queries with updates. It supports continuous evaluation of incoming RDF data using an encoding of SPARQL queries into Rete-like structures. INSTANS supports stateless/stateful filters using its internal memory, enrichment, (de-)composition, aggregation and pattern matching on the streamed events. The authors have implemented their approach, where they provide a conversion of the SPARQL queries to Rete rules, which then are evaluated on the Rete [38] rule engine JESS [39].

8.2.5. SPARQLstream/Morph-streams

SPARQLstream [24] extends standards SPARQL with time windows over streams similar to C-SPARQL, but adds the relation-to-stream operator for dealing with relational streams. SPARQLstream was further extended to Morph-streams [25], where OBDA techniques are applied to access the underlying streams, which then are stored in a stream processing system (SPS). R2RML mappings are used to create virtual streams on-the-fly, which can be accessed in their SPARQL extension. The authors implemented and tested their query rewriting techniques for different SPS, namely for SNEE, Esper, and Global Sensor Networks.

8.2.6. Clingo with multi-shot ASP

Multi-shot Answer Set Programming (ASP) is an extension of existing ASP solving techniques, which deals in a reactive fashion if new information arrives at the logic program, instead of solving the program from scratch. Clingo 4 [40] supports natively multi-shot solving by offering high-level constructs and control capacities via the scripting languages Lua and Python. The authors introduced the $# external$ directive, which allows a flexible handling of undefined atoms. Additionally, the solver needs to keep track of the sequence of system states, which is defined using a simple operational semantics, where operations such as $create$ , $add$ , or $assignExternal$ can be used to modify the states.

8.2.7. LARS with ticker/laser

The LARS framework [16] is a recent development in stream reasoning that considers lifting ASP to streams with generic windows for capturing data snapshots with the aim of generalizing time-, tuple- and partition-based window functions. To this end, the ASP syntax is extended with window operators and with temporal operators for evaluating truth at every, some, and a specific (exact) time point in a stream; variables ranging over domain constants or time points are allowed in rules. Semantically, answer sets of ASP programs are naturally generalized to answer streams of LARS programs. Fragments of LARS programs have been implemented in Ticker [18] and Laser [15], based on plain LARS rules.

In Ticker,14

¹⁴
https://github.com/hbeck/ticker

full negation is supported, with sliding time-based and tuple-based windows. It comes with two evaluation modes, viz. a static ASP encoding, which employs Clingo for repeated solving, and an incremental ASP encoding, which performs model update by truth maintenance techniques; the latter tends to be faster.

Laser15

¹⁵

https://github.com/karmaresearch/laser

is geared towards high performance evaluation and thus focuses on positive and stratified programs with sliding windows. It aims at fast model update by an efficient substitution management, for which it extends semi-naive evaluation of Datalog by incorporating the temporal dimension and tracks intervals of guaranteed formula validity. This avoids redundant re-derivations and allows for efficient removal of expired derivations. Laser was shown to outperform Ticker, C-SPARQL and CQELS in micro-benchmarks.

8.2.8. ONTOP with STARQL and DatalogMTL

The STARQL framework of Özçep et al. [64] is an effort of streamifying ODBA by introducing an extensible query language, hence it is closely related to our approach. It uses a first-order logic fragment for temporal reasoning over sequences of ABoxes. The framework extends the first-order query rewriting of DL-Lite with intra-ABox reasoning. In a second extension of ONTOP, the authors added Metric Temporal Logic (MTL) to allow querying of log data using LTL operators that are extended with time intervals [22,23]. For this purpose, they introduced datalogMTL, which combines non-recursive Datalog with the MTL operators.

Fig. 7.

C-SPARQL encoding (left) and CQELS encoding (right) of query $q_{1.1}$ .

8.2.9. RDFox

RDFox [62] is the combination of a scalable main-memory RDF store that supports materialisation and parallel Datalog reasoning, which also includes SPARQL query answering. The Datalog materialisation is based on a novel parallel reasoning algorithm extending the well-known DRed algorithm, computing incremental updates very efficiently on its internal triple store and thus is well suited to handle data streams.

8.2.10. TrOWL

TrOWL [72] is an incremental DL reasoner over the expressive OWL2 DL language. It handles streams of KBs instead of using fixed time windows over streams, hence allowing to add and remove axioms from the KB on-the-fly. The authors applied syntactic approximation to reduce the reasoning complexity, which guarantees soundness but looses completeness in certain cases. TrOWL provides justifications for the following entailments: atomic concept subsumption, atomic class assertion and atomic object property assertion.

8.3. Feature comparison

Based on the literature review and discussions with the authors of the systems, we conducted a feature evaluation of the above systems. In Table 3, we summarized the reviews and discussions, where we use the underlying specifications for the three levels of fulfillment: basic, enhanced, and advanced. For F1, the basis level matches to a point-based time model, the enhanced level would relate to an interval-based model, and the advanced level would include AIA over an interval-based model. For F2, the basic level includes pull-based, the enhanced level push-based queries, and the advanced level the combination of pull- and push-based queries. In F3 and F9, we list the query, rule, or ontology language and the possibility to allow windows, but do not classify the fulfillment level. For F4 to F8, we evaluate how a specific feature is covered by the basic fulfillment levels, since most systems are generic reasoners, and are not intended to support ITS-specific features.

As shown in Table 3, for F1 the ONTOP- and LARS-based systems offer similar or richer query- and ontology languages, since these systems allow the use of LTL and MTL operators. For F2, our work is on a par with most presented systems, except the two push-based systems CQELS and INSTANS, which support a higher reactiveness, since they support an eager query execution strategy. For F4 and F5, our work covers the widest range of numerical aggregation and prediction functions, with the exception of STARQL, which covers similar functionalities. One of the motivation of this work are the coverage of F6 to F8; hence our work is the only approach that supports spatial aggregations and predictions.

8.4. Performance comparison

We conducted our experiments again on a Mac OS X 10.14.4 system with an Intel Core i7 2.9 GHz, 8 GB of RAM, and a 250 GB SSD. We calculated the average of 50 runs for query evaluation time with warm starts, hence, we did not restart the systems over 50 runs in each experiment.

For the experiments, we compared our prototype on two selected queries with the state-of-the-art systems C-SPARQL and CQELS, which support limit reasoning but are designed to deal with high velocity and volume streams. The comparison of all presented queries is not feasible, since C-SPARQL and CQELS do not support natively the features spatial/temporal relations, spatial aggregation, and inline predictions. Hence, we selected the two queries $q_{1, .1}$ and $q_{2, .3}$ , where the first is our baseline comparison and the second is our running example. We pre-calculated the missing features such as the spatial relations in the log player and materialized the outcome as streamed data items. Furthermore, we adapted our CQs to the SPARQL dialects of each system. In Fig. 7, we give the encoding of $q_{1.1}$ as an example.

Table 4
Results (t in secs) for query $q_{1.1}$ and $q_{2.3}$ with the scenarios (l)ow, (m)edium, and (h)eavy traffic

Query System (l) with ms delay (m) with ms delay (h) with ms delay

5 10 50 100 5 10 50 100 5 10 50 100

$q_{1.1}$ C-SPARQL 0.019 0.032 0.031 0.027 0.132 0.134 0.113 0.110 0.252 0.347 0.324 0.314

CQELS 0.034 0.044 0.043 0.038 0.029 0.038 0.043 0.041 0.056 0.043 0.035 0.043

Spatial-stream QA 0.452 0.430 0.383 0.370 0.470 0.463 0.411 0.379 0.484 0.478 0.434 0.389

$q_{2.3}$ C-SPARQL 0.045 0.025 0.058 0.070 0.050 0.090 0.048 0.077 0.307 0.253 0.265 0.353

CQELS 0.001 0.003 0.013 0.034 0.001 0.002 0.006 0.017 0.001 0.003 0.009 0.020

Spatial-stream QA 0.658 0.634 0.561 0.542 0.692 0.653 0.574 0.548 0.670 0.660 0.585 0.554

Query	System	(l) with ms delay	(m) with ms delay	(h) with ms delay
$q_{1.1}$	C-SPARQL	0.019	0.032	0.031	0.027	0.132	0.134	0.113	0.110	0.252	0.347	0.324	0.314
CQELS	0.034	0.044	0.043	0.038	0.029	0.038	0.043	0.041	0.056	0.043	0.035	0.043
Spatial-stream QA	0.452	0.430	0.383	0.370	0.470	0.463	0.411	0.379	0.484	0.478	0.434	0.389
$q_{2.3}$	C-SPARQL	0.045	0.025	0.058	0.070	0.050	0.090	0.048	0.077	0.307	0.253	0.265	0.353
CQELS	0.001	0.003	0.013	0.034	0.001	0.002	0.006	0.017	0.001	0.003	0.009	0.020
Spatial-stream QA	0.658	0.634	0.561	0.542	0.692	0.653	0.574	0.548	0.670	0.660	0.585	0.554

The results of our experiments are shown in Table 4 where t is the AET in seconds for different traffic densities and update delays in ms. The baseline systems C-SPARQL and CQELS outperform our prototype in the range between 70 ms in $q_{1.1}$ (C-SPARQL on heavy traffic with 100 ms delay) and 657 ms in $q_{2.3}$ (CQELS on light traffic with 5 ms delay). The results are an important indicator for a lower bound of QA over streams. Yet, the results are not fully comparable since CQELS or C-SPARQL respectively, do not support RDFS or OWL2 QL, respectively; there is a trade-off between performance and expressivity, where in CQELS the full TBox is omitted, and in C-SPARQL axioms with existentially-quantified variables on the right-hand side of inclusion assertions are ignored leading to incomplete results. Furthermore, both systems do not support directly spatial relations, aggregates, and predictions, which amount to approximately 100 ms evaluation time, since these functions are precomputed by the log player in the experiments, which is not reflected in the AET. The results also indicate that by increasing traffic density, the performance seems to align in $q_{1.1}$ , where the grouping/aggregation might be the most demanding operation for C-SPARQL and CQELS.

After profiling the runtime of our prototype, we noticed that approximately 200 ms are lost by establishing a connecting to PipelineDB, which could be mitigated by connection pooling using a persistent connection pool such as provided by PG Bouncer.16

¹⁶

http://pgfoundry.org/projects/pgbouncer

9. Conclusion and future work

This work was sparked by the need of applying spatial-stream query answering as an effort to integrate and access streamed mobility data, e.g., vehicle movements, in a spatial context over the complex C-ITS domain. In [36] we have introduced simple aggregate queries over streams, which often do not suffice to capture more complex C-ITS use cases. In this paper, we presented an extension with temporal relations and numerical/trajectory predictions, which allows us to query complex mobility patterns such as traffic statistics, or complex events such as detecting (potential) accidents. Based on the newly developed scenarios of traffic statistics, event detection, and advanced driving assistance systems (ADAS), we have defined a set of domain-specific features such as trajectory computation, which are matched with the scenarios/use cases to define key requirements. Given the new features, we extended the LDM ontology, the spatial-stream query language, and extended the methods used for query answering accordingly. We also redesigned the architecture and optimized the system by pre-compiling the static query elements and executing stream atoms in parallel. The experimental evaluation provides evidence for an improved performance of approx. 40% and an evaluation time below 700 ms. This indicates that potentially the feasibility and efficiency of our approach in the mentioned scenarios is given.

Lessons learned The presented approach of spatial-stream query answering is well-suited for data integration and query answering in the C-ITS domain. The concept of a LDM was a good starting point, since it has been developed and standardized by the C-ITS community and was already extended by Netten et al. to an ontology-like model [63]. In particular, Semantic Web Technologies play to their strengths in easily modelling a complex domain such as C-ITS, and allowing the (expert) user to formulate powerful queries on top of the streams that are integrated by the ontology. This can be seen by the new scenario ADAS, where small modifications of the ontology and new queries open up a new application field. However, our approach of using OQA revealed some limitations that are discussed in the expert summary.

Using spatial-stream CQs for capturing the scenarios worked out to our satisfaction in most of the use cases. But as illustrated with use cases S2.4, S2.5, and S3.3, our language reached its limits regarding “usability” and also “expressivity”. If we have a larger set of rules as in S3.3 (even without transitivity), the conversion to unions of CQs becomes cumbersome and inefficient (AET is between 2.46 s and 3.34 s), hence rule engines such as in Ticker [18] or Laser [15] might be better suited. In S1.3 and S2.5, we can see that qualitative temporal relations like $before$ are convenient, since we are able to avoid the implicit encoding of temporal relation using shifted windows of different sizes, which needs an a-priori knowledge of appropriate window sizes and the relative positions to each other. This makes a windows-based handling of temporal relations inflexible and prone to errors.

Furthermore, the usage of CQs with sub-queries can be directly transferred to SPARQL queries; the underlying evaluation system would have to be adjusted to deal with the mobility-specific features.

Outlook We believe that stream processing/reasoning methods could well be applied to the mentioned C-ITS scenarios, which was confirmed by the experts; they also acknowledged that the new features such as time intervals with temporal relations and prediction capabilities are important extensions of the initial spatial-stream OQA approach.

Nevertheless, the experts identified practical and theoretical extensions that should be addressed in future research. On the practical side, they suggested that the LDM ontology could be (and has been) combined with the SSN ontology [46], where Collective Perception Messages [70] could be used to integrate local sensor data of other vehicles. This could be further elaborated such that the different LDMs (of each vehicle) could be aligned to a single global dynamic map. Furthermore, the experts suggested that we could integrate Kalman filters [80] and top-k aggregates [47] to provide more powerful aggregates. On the theoretical side, our methods could be extended to capture uncertainty on the level of data items and also of TBox assertions, which would lead to a change in the underlying semantics and the computational properties. Furthermore, our methods could be extended to handle windows in a more flexible manner by flexibly forgetting data items or extending their validity into the future. Finally, one expert suggested that the focus of Scenario 3 could more on motion planning instead of ADAS.

In addition to the expert suggestions, we believe that efforts on following issues would be beneficial: (a) allowing the integration of external (domain-specific) modules such as functions for advanced trajectory prediction, which would be similar to external atoms in ASP [33]; (b) the possibility of analytic queries over longer periods, hence an extension with a transient cache and variable window sizes would be needed; (c) the full integration of OQA with IA relations, which would include the representation of IA networks and the rewriting of subsets of the composition table; and (d) handling more complex queries and rules while still maintaining scalability, which would bring our approach closer to Ticker [18] and Laser [15].

As discussed in the section above, ongoing and future research should be directed to extend the languages, methods, and the platform to fulfill the defined requirements, which will enable us to apply our approach and prototype to more complex scenarios such as advanced traffic monitoring and motion planing.

Footnotes

Acknowledgements

This work has been supported by the Austrian Research Promotion Agency project LocTraffLog (FFG 5886550) and DynaCon (FFG 861263), as well as by the European Commission through IoTCrawler (H2020 contract 779852).

References

ETSI EN 302 895 (V1.1.0), Intelligent transport systems – Extension of map database specifications for local dynamic map for applications of cooperative ITS, Technical report, ETSI, 2014.

ETSI TR 102 863 (V1.1.1), Intelligent transport systems (ITS); Vehicular communications; Basic set of applications; Local dynamic map (LDM); Rationale for and guidance on standardization, Technical report, ETSI, 2011.

ETSI TS 103 191-3 (V1.1.1), Intelligent transport systems (ITS); Testing; Conformance test specifications for signal phase and timing (SPAT) and map (MAP); Part 3: Abstract test suite (ATS) and protocol implementation eXtra information for testing (PIXIT), Technical report, ETSI, 2015.

ETSI EN 302 637-2 (V1.3.2), Intelligent transport systems (ITS); Vehicular communications; Basic set of applications; Part 2: Specification of cooperative awareness basic service, Technical report, ETSI, 2014.

ETSI EN 302 637-3 (V1.2.2), Intelligent transport systems (ITS); Vehicular communications; Basic set of applications; Part 3: Specifications of decentralized environmental notification basic service, Technical report, ETSI, 2014.

J.F.

Allen, Maintaining knowledge about temporal intervals, Com. ACM26(11) (1983), 832–843. doi:10.1145/182.358434.

J.F.

Allen, Maintaining knowledge about temporal intervals, Communications of the ACM26(11) (1983), 832–843. doi:10.1145/182.358434.

Andreone,

Brignolo,

Damiani,

Sommariva,

Vivo and

Marco, SAFESPOT final report, Technical report, D8.1.1, 2010, http://www.safespot-eu.org/documents/D8.1.1_Final_Report_-_Public_v1.0.pdf.

Anicic,

Fodor,

Rudolph and

Stojanovic, EP-SPARQL: A unified language for event processing and stream reasoning, in: Proc. of WWW 2011, 2011, pp. 635–644, https://dl.acm.org/doi/10.1145/1963405.1963495 .

10.

Anicic,

Rudolph,

Fodor and

Stojanovic, Stream reasoning and complex event processing in ETALIS, Semantic Web3(4) (2012), 397–407. doi:10.3233/SW-2011-0053.

11.

Arasu,

Babu and

Widom, The CQL continuous query language: Semantic foundations and query execution, The VLDB Journal15(2) (2006), 121–142. doi:10.1007/s00778-004-0147-z.

12.

Artale,

Kontchakov,

Kovtunova,

Ryzhikov,

Wolter and

Zakharyaschev, First-order rewritability of temporal ontology-mediated queries, in: Proc. of IJCAI 2015, 2015, pp. 2706–2712, https://dl.acm.org/doi/10.5555/2832581.2832627 .

13.

Barbieri,

Braga,

Ceri,

E.D.

Valle,

Huang,

Tresp,

Rettinger and

Wermser, Deductive and inductive stream reasoning for semantic social media analytics, IEEE Intelligent Systems25(6) (2010), 32–41. doi:10.1109/MIS.2010.142.

14.

D.F.

Barbieri,

Braga,

Ceri,

E.D.

Valle and

Grossniklaus, C-SPARQL: A continuous query language for RDF data streams, International Journal of Semantic Computing4(1) (2010), 3–25. doi:10.1142/S1793351X10000936.

15.

H.R.

Bazoobandi,

Beck and

Urbani, Expressive stream reasoning with laser, in: The Semantic Web – ISWC 2017 – 16th International Semantic Web Conference, Vienna, Austria, Proceedings, Part I, 2017, pp. 87–103. doi:10.1007/978-3-319-68288-4_6.

16.

Beck,

Dao-Tran and

Eiter, LARS: A logic-based framework for analytic reasoning over streams, Artif. Intell.261 (2018), 16–70, https://dl.acm.org/doi/10.5555/2887007.2887205 . doi:10.1016/j.artint.2018.04.003.

17.

Beck,

Dao-Tran,

Eiter and

Fink, LARS: A logic-based framework for analyzing reasoning over streams, in: Proc. of AAAI 2015, 2015, pp. 1431–1438.

18.

Beck,

Eiter and

Folie, Ticker: A system for incremental ASP-based stream reasoning, Theory and Practice of Logic Programming17(5–6) (2017), 744–763. doi:10.1017/S1471068417000370.

19.

Bogner,

Littig and

Menz, Das Experteninterview, VS Verlag für Sozialwissenschaften, 2009. doi:10.1007/978-3-322-93270-9.

20.

Bolles,

Grawunder and

Jacobi, Streaming SPARQL – Extending SPARQL to process data streams, in: Proc. of ESWC 2008, 2008, pp. 448–462. doi:10.1007/978-3-540-68234-9_34.

21.

Borgwardt,

Lippmann and

Thost, Temporalizing rewritable query languages over knowledge bases, Journal of Web Semantics33 (2015), 50–70. doi:10.1016/j.websem.2014.11.007.

22.

Brandt,

E.G.

Kalayci,

Kontchakov,

Ryzhikov,

Xiao and

Zakharyaschev, Ontology-based data access with a horn fragment of metric temporal logic, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2017, pp. 1070–1076, https://dl.acm.org/doi/10.5555/3298239.3298397 .

23.

Brandt,

E.G.

Kalayci,

Ryzhikov,

Xiao and

Zakharyaschev, Querying log data with metric temporal logic, Journal of Artificial Intelligence Research62 (2018), 829–877. doi:10.1613/jair.1.11229.

24.

Calbimonte,

Mora and

Ó.

Corcho, Query rewriting in RDF stream processing, in: Proc. of ESWC 2016, 2016, pp. 486–502. doi:10.1007/978-3-319-34129-3_30.

25.

J.-P.

Calbimonte,

Jeung,

Ó.

Corcho and

Aberer, Enabling query technologies for the semantic sensor web, Int. J. Semantic Web Inf. Syst.8(1) (2012), 43–63. doi:10.4018/jswis.2012010103.

26.

Calvanese,

G.D.

Giacomo,

Lembo,

Lenzerini and

Rosati, Tractable reasoning and efficient query answering in description logics: The DL-Lite family, Journal of Automated Reasoning39(3) (2007), 385–429. doi:10.1007/s10817-007-9078-x.

27.

Calvanese,

Kharlamov,

Nutt and

Thorne, Aggregate queries over ontologies, in: Proc. of ONISW 2008, 2008, pp. 97–104. doi:10.1145/1458484.1458500.

28.

Carney,

Çetintemel,

Cherniack,

Convey,

Lee,

Seidman,

Stonebraker,

Tatbul and

Zdonik, Monitoring streams: A new class of data management applications, in: Proc. of VLDB 2002, 2002, pp. 215–226. doi:10.1016/B978-155860869-6/50027-5.

29.

Comer, Ubiquitous B-tree, ACM Comput. Surv.11(2) (1979), 121–137. doi:10.1145/356770.356776.

30.

Dechter, Tree decomposition methods(Chapter 9), in: Constraint Processing,

Dechter, ed., The Morgan Kaufmann Series in Artificial Intelligence, Morgan Kaufmann, San Francisco, 2003, pp. 245–269. doi:10.1016/B978-155860890-0/50010-4.

31.

Dell’Aglio,

E.D.

Valle,

van Harmelen and

Bernstein, Stream reasoning: A survey and outlook, Data Science1(1–2) (2017), 59–83. doi:10.3233/DS-170006.

32.

Dermaku,

Ganzow,

Gottlob,

B.J.

McMahan,

Musliu and

Samer, Heuristic methods for hypertree decomposition, in: Proc. of MICAI 2008: Advances in Artificial Intelligence, 2008, pp. 1–11. doi:10.1007/978-3-540-88636-5_1.

33.

Eiter,

Ianni and

Krennwallner, Answer set programming: A primer, in: Reasoning Web. Semantic Technologies for Information Systems, 5th International Summer School 2009, Brixen-Bressanone, Italy, Tutorial Lectures, 2009, pp. 40–110. doi:10.1007/978-3-642-03754-2_2.

34.

Eiter,

Ichise,

J.X.

Parreira,

Schneider and

Zhao, Deploying spatial-stream query answering in C-ITS scenarios, in: Proc. of EKAW 2018, 2018, pp. 386–406. doi:10.1007/978-3-030-03667-6_25.

35.

Eiter,

Krennwallner and

Schneider, Lightweight spatial conjunctive query answering using keywords, in: Proc. of ESWC 2013, 2013, pp. 243–258. doi:10.1007/978-3-642-38288-8_17.

36.

Eiter,

J.X.

Parreira and

Schneider, Spatial ontology-mediated query answering over mobility streams, in: Proc. of ESWC 2017, 2017, pp. 219–237. doi:10.1007/978-3-319-58068-5_14.

37.

Eiter,

J.X.

Parreira and

Schneider, Detecting mobility patterns using spatial query answering over streams, in: Proc. of Stream Reasoning Workshop 2017, 2017, http://ceur-ws.org/Vol-1936/paper-02.pdf .

38.

C.L.

Forgy, Rete: A fast algorithm for the many pattern/many object pattern match problem, Artificial Intelligence19(1) (1982), 17–37. doi:10.1016/0004-3702(82)90020-0.

39.

Friedman-Hill, Jess in Action: Rule-Based Systems in Java, Manning Publications, 2003. ISBN 978-1-930-11089-2.

40.

Gebser,

Kaminski,

Kaufmann and

Schaub, Clingo = ASP + control: Preliminary report, CoRR1405.3694 (2014), https://arxiv.org/abs/1405.3694 .

41.

Gottlob,

Leone and

Scarcello, The complexity of acyclic conjunctive queries, Journal of the ACM48(3) (2001), 431–498. doi:10.1145/382780.382783.

42.

Gottlob,

Leone and

Scarcello, Hypertree decompositions: A survey, in: Mathematical Foundations of Computer Science 2001,

Sgall,

Pultr and

Kolman, eds, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 37–57. doi:10.1007/3-540-44683-4_5.

43.

Graefe, Query evaluation techniques for large databases, ACM Computing Surveys25(2) (1993), 73–169. doi:10.1145/152610.152611.

44.

R.H.

Güting, Geo-relational algebra: A model and query language for geometric database systems, in: EDBT 1988, 1988, pp. 506–527. doi:10.1007/3-540-19074-0_70.

45.

R.H.

Güting and

Schneider, Moving Objects Databases, Morgan Kaufmann, 2005. doi:10.1016/B978-0-12-088799-6.X5000-2.

46.

Haller,

Janowicz,

S.J.D.

Cox,

Lefrançois,

Taylor,

D.L.

Phuoc,

Lieberman,

García-Castro,

Atkinson and

Stadler, The modular SSN ontology: A joint W3C and OGC standard specifying the semantics of sensors, observations, sampling, and actuation, Semantic Web10(1) (2019), 9–32. doi:10.3233/SW-180320.

47.

I.F.

Ilyas,

W.G.

Aref and

A.K.

Elmagarmid, Supporting top-k join queries in relational databases, The VLDB Journal13(3) (2004), 207–221. doi:10.1007/s00778-004-0128-2.

48.

Janowicz,

Haller,

S.J.D.

Cox,

D.L.

Phuoc and

Lefrançois, SOSA: A lightweight ontology for sensors, observations, samples, and actuators, Journal of Web Semantics56 (2019), 1–10. doi:10.1016/j.websem.2018.06.003.

49.

Kikot,

Kontchakov and

Zakharyaschev, Conjunctive query answering with OWL 2 QL, in: Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 2012, AAAI Press, 2012, pp. 275–285, https://dl.acm.org/doi/10.5555/3031843.3031876 .

50.

Klarman and

Meyer, Querying temporal databases via OWL 2 QL, in: Proc. of RR 2014, 2014, pp. 92–107. doi:10.1007/978-3-319-11113-1_7.

51.

Kontchakov,

Rodriguez-Muro and

Zakharyaschev, Ontology-based data access with databases: A short course, in: Reasoning Web. Semantic Technologies for Intelligent Data Access – 9th International Summer School 2013, Mannheim, Germany, Proceedings, 2013, pp. 194–229. doi:10.1007/978-3-642-39784-4_5.

52.

Kontchakov and

Zakharyaschev, An introduction to description logics and query rewriting, in: Reasoning Web. Reasoning on the Web in the Big Data Era – 10th International Summer School 2014, Athens, Greece, Proceedings, 2014, pp. 195–244. doi:10.1007/978-3-319-10587-1_5.

53.

E.V.

Kostylev and

J.L.

Reutter, Answering counting aggregate queries over ontologies of the DL-Lite family, in: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2013, AAAI Press, 2013, pp. 534–540, https://dl.acm.org/doi/10.5555/2891460.2891534 .

54.

Koubarakis and

Kyzirakos, Modeling and querying metadata in the semantic sensor web: The model stRDF and the query language stSPARQL, in: ESWC 2010, 2010, pp. 425–439. doi:10.1007/978-3-642-13486-9_29.

55.

Le-Phuoc,

Dao-Tran,

J.X.

Parreira and

Hauswirth, A native and adaptive approach for unified processing of linked streams and linked data, in: ISWC 2011, 2011, pp. 370–388. doi:10.1007/978-3-642-25073-6_24.

56.

E.E.

Maccoby and

N.A.

Maccoby, The interview: A tool of social science, in: Handbook of Social Psychology, Vol. 1, Addison-Wesley, 1954.

57.

Madden,

Shah,

J.M.

Hellerstein and

Raman, Continuously adaptive continuous queries over streams, in: 2002 ACM SIGMOD International Conference on Management of Data, 2002, pp. 49–60. doi:10.1145/564691.564698.

58.

Maier, The Theory of Relational Databases, Computer Science Press, 1983.

59.

Margara,

Urbani,

van Harmelen and

H.E.

Bal, Streaming the web: Reasoning over dynamic data, J. Web Semant.25 (2014), 24–44. doi:10.1016/j.websem.2014.02.001.

60.

Mason, Qualitative Researching, SAGE Publications Ltd, 2002.

61.

Minichiello,

Aroni and

Hays, In-Depth Interviewing: Principles, Techniques, Analysis, 3rd edn, Pearson Australia Group, 2008.

62.

Nenov,

Piro,

Motik,

Horrocks,

Wu and

Banerjee, RDFox: A highly-scalable RDF store, in: The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Bethlehem, PA, USA, Proceedings, Part II, 2015, pp. 3–20. doi:10.1007/978-3-319-25010-6_1.

63.

Netten,

Kester,

Wedemeijer,

Passchier and

Driessen, DynaMap: A dynamic map for road side ITS stations, in: Proc. of ITS World Congress 2013, 2013, https://trid.trb.org/view.aspx?id=1322235 .

64.

Ö.L.

Özçep,

Möller and

Neuenstadt, A stream-temporal query language for ontology based data access, in: KI 2014: Advances in Artificial Intelligence – 37th Annual German Conference on AI, Stuttgart, Proceedings, 2014, pp. 183–194. doi:10.1007/978-3-319-11206-0_18.

65.

C.H.

Papadimitriou, Computational Complexity, Academic Internet Publ., 2007.

66.

Pérez-Urbina,

Horrocks and

Motik, Practical aspects of query rewriting for OWL 2, in: Proceedings of the 6th International Conference on OWL: Experiences and Directions, Vol. 529, OWLED 2009, CEUR-WS.org, Aachen, Germany, 2009, pp. 152–159, http://ceur-ws.org/Vol-529/owled2009_submission_17.pdf .

67.

Perry,

Jain and

A.P.

Sheth, SPARQL-ST: Extending SPARQL to support spatiotemporal queries, Geospatial Semantics and the Semantic Web12 (2011), 61–86. doi:10.1007/978-1-4419-9446-2_3.

68.

Pnueli, The temporal logic of programs, in: Proc. of Annual Symposium on Foundations of Computer Science 1977, 1977, pp. 46–57. doi:10.1109/SFCS.1977.32.

69.

H.N.M.

Quoc and

Le Phuoc, An elastic and scalable spatiotemporal query processing for linked sensor data, in: Proc. of SEMANTICS 2015, ACM, 2015, pp. 17–24. doi:10.1145/2814864.2814869.

70.

Rauch,

Klanner,

R.H.

Rasshofer and

Dietmayer, Car2X-based perception in a high-level fusion architecture for cooperative perception systems, in: 2012 IEEE Intelligent Vehicles Symposium, IV 2012, Alcal de Henares, Madrid, Spain, 2012, pp. 270–275. doi:10.1109/IVS.2012.6232130.

71.

Ren,

M.H.

Dunham and

Kumar, Semantic caching and query processing, IEEE Trans. on Knowl. and Data Eng.15(1) (2003), 192–210. doi:10.1109/TKDE.2003.1161590.

72.

Ren and

J.Z.

Pan, Optimising ontology stream reasoning with truth maintenance system, in: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24–28, 2011, 2011, pp. 831–836. doi:10.1145/2063576.2063696.

73.

Rinne and

Nuutila, Constructing event processing systems of layered and heterogeneous events with SPARQL, in: On the Move to Meaningful Internet Systems: OTM 2014 Conferences – Confederated International Conferences: CoopIS, and ODBASE 2014, Amantea, Italy, Proceedings, 2014, pp. 682–699. doi:10.1007/978-3-662-45563-0_42.

74.

Rodriguez-Muro,

Kontchakov and

Zakharyaschev, Ontology-based data access: Ontop of databases, in: Proc. of ISWC 2013, 2013, pp. 558–573. doi:10.1007/978-3-642-41335-3_35.

75.

Rosati and

Almatelli, Improving query answering over DL-Lite ontologies, in: Proceedings of the Twelfth International Conference on Principles of Knowledge Representation and Reasoning, KR 2010, AAAI Press, 2010, pp. 290–300, https://dl.acm.org/doi/10.5555/3031748.3031786 .

76.

Shimada,

Yamaguchi,

Takada and

Sato, Implementation and evaluation of local dynamic map in safety driving systems, J. Transportation Technologies5(2) (2015), 102–112. doi:10.4236/jtts.2015.52010.

77.

Stocker and

Smith, Owlgres: A scalable OWL reasoner, in: Proc. of OWLED 2008, 2008, http://ceur-ws.org/Vol-432/owled2008eu_submission_25.pdf .

78.

Stonebraker,

Çetintemel and

S.B.

Zdonik, The 8 requirements of real-time stream processing, SIGMOD Record34(4) (2005), 42–47. doi:10.1145/1107499.1107504.

79.

M.Y.

Vardi, The complexity of relational query languages(extended abstract), in: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC 1982, ACM, New York, NY, USA, 1982, pp. 137–146. doi:10.1145/800070.802186.

80.

Welch and

Bishop, An introduction to the Kalman filter, Technical report, 95-041, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, 1995, http://www.cs.unc.edu/~welch/kalman/kalmanIntro.html.

81.

Zhao,

Ichise,

Liu,

Mita and

Sasaki, Ontology-based driving decision making: A feasibility study at uncontrolled intersections, IEICE Trans.100-D(7) (2017), 1425–1439. doi:10.1587/transinf.2016EDP7337.

Deploying spatial-stream query answering in C-ITS scenarios 1

Abstract

Keywords

1. Introduction

2. C-ITS data integration and query answering

2.1. Local dynamic map

2 Available at http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapITS-v0.5-Lite.owl and http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapSOSA-v0.1-Lite.owl.

3.1. Scenario description

3.2. Features for spatial-stream QA

4. Expert interviews

3 http://www.drive-c2x.eu/project

5.1. Data model and knowledge base

4 http://odysseus.informatik.uni-oldenburg.de/, https://www.pipelinedb.com/, and https://sqlstream.com/.

5.3. Query rewriting by stream aggregation

5 We simplified EAQs of [27] by omitting ψ and consider only aggregates with a single variable.

5.5. Query evaluation by hypertree decomposition

5.7. Aggregations and predictions

5.8. Optimization techniques

6 https://www.pipelinedb.com/

9 https://www.dbai.tuwien.ac.at/proj/hypertree/

12 http://www.kr.tuwien.ac.at/research/projects/loctrafflog/ekaw2018

13 http://vision-traffic.ptvgroup.com/en-us/products/ptv-vissim/

7.4. Feature coverage

7.5. Summary of expert evaluation

8. Related work and system comparison

8.1. Overview

8.2.1. C-SPARQL

8.2.2. CQELS

8.2.3. ETALIS with EP-SPARQL

8.2.4. INSTANS

8.2.5. SPARQLstream/Morph-streams

8.2.6. Clingo with multi-shot ASP

8.2.7. LARS with ticker/laser

14 https://github.com/hbeck/ticker

8.2.10. TrOWL

8.3. Feature comparison

8.4. Performance comparison

Footnotes

Acknowledgements

References

²
Available at http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapITS-v0.5-Lite.owl and http://www.kr.tuwien.ac.at/research/projects/loctrafflog/LocalDynamicMapSOSA-v0.1-Lite.owl.

³
http://www.drive-c2x.eu/project

⁴
http://odysseus.informatik.uni-oldenburg.de/, https://www.pipelinedb.com/, and https://sqlstream.com/.

⁵
We simplified EAQs of [27] by omitting ψ and consider only aggregates with a single variable.

⁶
https://www.pipelinedb.com/

⁹
https://www.dbai.tuwien.ac.at/proj/hypertree/

¹²
http://www.kr.tuwien.ac.at/research/projects/loctrafflog/ekaw2018

¹³
http://vision-traffic.ptvgroup.com/en-us/products/ptv-vissim/

¹⁴
https://github.com/hbeck/ticker