Abstract
A knowledge graph (KG) represents real-world entities and their relationships. The represented knowledge is often context-dependent, leading to the construction of contextualized KGs. The multidimensional and hierarchical nature of context invites comparison with the OLAP cube model from multidimensional data analysis. Traditional systems for online analytical processing (OLAP) employ multidimensional models to represent numeric values for further analysis using dedicated query operations. In this paper, along with an adaptation of the OLAP cube model for KGs, we introduce an adaptation of the traditional OLAP query operations for the purposes of performing analysis over KGs. In particular, we decompose the roll-up operation from traditional OLAP into a merge and an abstraction operation. The merge operation corresponds to the selection of knowledge from different contexts whereas abstraction replaces entities with more general entities. The result of such a query is a more abstract, high-level view – a
Keywords
Introduction
A knowledge graph (KG) represents real-world entities and their relationships. KGs have been described as “large networks of entities, their semantic types, properties, and relationships” [49], as consisting of “a set of interconnected typed entities and their attributes” [31] with possibly arbitrary relationships [64]. The majority of a KG’s contents are instance-level facts, or assertional knowledge (ABox) [64], although KGs may also include terminological and ontological knowledge (TBox) representing “the vocabulary used in the knowledge graph” [31] in order to allow for “ontological reasoning and query answering” [5] over the facts. Furthermore, a KG typically covers a variety of topics rather than focusing exclusively on a single aspect of the real world such as geographic terms [64]. For the representation of KGs, the Resource Description Framework (RDF) [23] often serves as the data model.
KGs present a wide range of potential applications, e.g., (web) search [35] and question-answering [81], intra-company knowledge management [65] and investment analysis [68]. Among the most popular examples of KGs are proprietary ones such as Google’s Knowledge Graph [32] and Microsoft’s Satori [66] as well as community-driven efforts such as DBpedia [52] and Wikidata [87]. More and more organizations follow suit with the development of KGs for their own purposes, necessitating the development of appropriate
In order to facilitate KG management, KGs are increasingly subject to
The multidimensional nature of context invites comparison with the multidimensional modeling approach as employed by online analytical processing (OLAP) systems for data analysis. In traditional OLAP systems, hierarchically ordered dimensions span a multidimensional space – also referred to as OLAP cube – where each point (or cell) represents an event of interest quantified by numeric measures. Similarly, context dimensions span a multidimensional space where each cell represents a context that comprises facts of a KG. OLAP systems employ multidimensional models to perform analytical queries over datasets using operations such as slice-and-dice and roll-up (see [83] for further information). In this regard, slice-and-dice refers to the selection of relevant data for the analysis whereas roll-up refers to the aggregation of the selected data in order to obtain a more abstract view on the underlying business situation.
In this paper, we introduce

Traditional OLAP operates on numeric measures whereas KG-OLAP operates on knowledge graphs.
Fig. 1 draws an analogy between (a) the roll-up operation in traditional OLAP on numeric measures and (b) merge and abstraction in KG-OLAP on KGs. First, the example emphasizes the fact that traditional OLAP and KG-OLAP work on distinct types of data. The example’s general setting is air traffic management, where messages dispatched by air traffic control notify of temporary changes in infrastructure. In traditional OLAP, the description of real-world events is condensed into numeric measures: In this example, the OLAP cube captures the daily number of messages dispatched by air traffic control per geographic segment and message importance. In KG-OLAP, each cell of the cube contains knowledge, encoded as triples, valid and relevant in the context that the cell represents: In this example, knowledge about navigation aids of varying importance, valid on a particular day, and relevant for a specific geographic segment. In particular, the KG-OLAP cube comprises knowledge about navigation aids used for approaching Vienna airport (
We illustrate KG-OLAP following use cases of (contextualized) knowledge graphs for
Previous work [72,73] introduced the concept of
The contributions of this paper are:
the identification of requirements for a model and query operations for performing analysis over KGs;
the formalization of a model for KGs with hierarchical contexts and knowledge propagation suitable for performing analysis;
the definition of a set of query operations for performing analysis over KGs;
an experimental analysis of run time performance for working with contextualized KGs.
The remainder of this paper is organized as follows. In Section 2, we present the use cases that serve to illustrate KG-OLAP and motivate the approach. In Section 3, we introduce the formalization for the multi-dimensional model of KG-OLAP cubes. In Section 4, we present query operations for KG-OLAP cubes. In Section 5, we discuss implementation of the approach. In Section 6, we review related work. We conclude with a discussion and an outlook on future work.
Further details about used language, reasoning methods, implementation, and experimental evaluation are provided in the separate Appendix [70].
In this section, we give relevant background information on air traffic management (ATM) before discussing the use of (contextualized) KGs for aeronautical information management. We then describe two potential use cases of KG-OLAP in the ATM domain, from which we derive functional requirements.
Background
Modern ATM aims to ensure safe flight operations through careful management, analysis, and advance planning of air traffic flow as well as timely provisioning of relevant information in the form of messages. The exchange of data/information between ATM stakeholders is of paramount importance in order to foster common situational awareness for improved efficiency, safety, and quality in planning and operations. In this regard, situational awareness refers to a “person’s knowledge of particular task-related events and phenomena” [78], i.e., knowledge about the world relevant for ATM, which must be accurately represented and conveyed to the various stakeholders. To this end, ATM relies on a multitude of standardized data/information (exchange) models, e.g., the

An example DNOTAM in XML notifying of a taxiway closure in Vienna due to snow removal
Among the most common types of messages exchanged in ATM are Notices to Airmen. A Notice to Airmen (NOTAM) – or Digital NOTAM (DNOTAM) when in electronic form using AIXM format – is a message that conveys important information about temporary changes in flight conditions to aircraft pilots [39], e.g., closures of aerodromes, runways, and taxiways, surface conditions, and construction activities (see [28] for a list of airport event scenarios) but also airspace restrictions. Messages shape the knowledge about the world as relevant for ATM. For example, a DNOTAM (Listing 1) may change the knowledge about the taxiways of a particular airport by announcing the temporary closure of a taxiway due to snow removal. To this end, a DNOTAM employs different

A contextualized KG based on the DNOTAM in Listing 1.
The knowledge encoded in DNOTAMs is more naturally represented using contextualized KGs [71]. Fig. 2 illustrates the contextualized representation of the knowledge encoded in the DNOTAM from Listing 1 along a temporal dimension. The
In our scenario, RDF serves as the common representation language for ATM knowledge even though XML is the native serialization format of DNOTAMs in the AIXM standard. AIXM, however, builds on the Geography Markup Language (GML), the initial proposal of which was based directly on RDF, with subsequent editions continuing to “borrow many ideas from RDF” [50, p. 20], including the GML’s object-property model [50, p. 16]. Similarly, other ATM information (exchange) models are easily translated into RDF.
In fact, ATM research has shown growing interest in the use of semantic technologies (see [40] for an overview), leading to the development of domain ontologies [33,85], e.g., the
In the remainder of this paper, we use example KGs for illustration purposes largely following the AIXM conceptual models [1] and the FAA’s airport operations scenarios [28]. In the following, we present two use cases for contextualized KGs in the ATM domain, which serve to motivate the KG-OLAP approach.
Use case 1: Pilot briefings
Prior to a flight, pilots receive a pre-flight information bulletin (PIB), which comprises all DNOTAMs, but also METARs, i.e., messages about weather conditions, that are even remotely relevant for the upcoming flight. Pilots then have to manually sift through an abundance of messages and decide upon the priority of each message. Among other things, the priority of a message depends on the current flight phase. For example, during take-off, a taxiway closure at the destination airport has low priority for a pilot.
Automated rule-based filtering and prioritization of DNOTAMs reduces information overload in pilot briefings [79]. Pilots formulate an
The result of DNOTAM filtering and prioritization may be packaged into a
Automated rule-based filtering and prioritization in combination with semantic containers may serve to construct ATM information cubes [72,73]. Fine-grained semantic containers comprising data items of different geographic and temporal scopes as well as importance levels may be arranged in a multidimensional space. A pilot could then select the appropriate containers at the right moment without being overloaded with information. Individual messages could be combined into a more abstract representation in order to obtain a
In this paper, rather than considering ATM information cubes with messages collected into semantic containers, we consider
Use case 2: Post-operational analysis
Air traffic flow and capacity management (ATFCM), which represents one of the core activities in ATM, has post-operations teams analyzing operational events in order to identify valuable lessons learned for the benefit of future operations and produces an overview of occurred incidents [62, p. 131]. A data warehouse provides the post-operations team with statistical data about past flight operations [62, p. 130].
Post-operational analytical tasks in ATFCM may also leverage contextualized ATM knowledge. The rationale is similar to ATMGRAPH’s [44], a KG having richer semantics while being more flexible and versatile than simple statistical data organized in a data warehouse. In addition to the data warehouse, the post-operations team may employ a KG-OLAP cube of contextualized ATM knowledge extracted from DNOTAMs and other types of messages, organized by temporal relevance, route or ground segment, aircraft model, and importance. By analyzing such ATM knowledge, an air traffic flow post-operations team may gain a more comprehensive picture of past air traffic operations. Using the
In the remainder of this paper, we employ contextualized KGs for the ATM domain to illustrate the KG-OLAP approach. In particular, we focus mainly on ATM knowledge derived from DNOTAMs. The example contextualized KGs are plausible for both pilot briefings and post-operational analysis.
Functional requirements
Based on the presented use cases from the ATM domain, we identify functional requirements that a system for performing data analysis over KGs must fulfill. Although derived from use cases in ATM, the requirements also apply to other use cases, e.g., the analysis of business situations formalized using business model ontologies [74].
(Heterogeneity).
The system must cope with heterogeneity in the knowledge representation. Heterogeneity is inherent to KGs, which are not simple graphs. In particular, heterogeneity in KGs manifests itself as follows:
Multiple types of entities. In the ATM domain, for example, a KG comprises knowledge about various kinds of infrastructure but also flight plans, airspace, weather events, and more. Multiple types of relationships. Unlike simple graphs, the entities in a KG are connected via various relationships. For example, one type of relationship serves to indicate the airport that a runway is situated at while another associates a runway with a usage restriction. Schema variability, i.e., entities of the same type may have different properties and relationships. For example, runway availability may warn of inspection adjacent to a runway. But, runway availability may also announce runway closure, e.g., due to snow, or mention the characteristics of aircraft that are prohibited to land, which in turn may depend on weight, wingspan, etc.
(Ontological knowledge).
The system must handle ontological knowledge that describes relationships between classes and employs logic for the definition of domain-specific terms. ATM information (exchange) models, for example, comprise a multitude of generalization/specialization relationships and define associations between classes, which potentially translate into domain and range constraints. ATM technical language is rife with domain-specific terms, e.g., heavy wake-turbulence aircraft, the meaning of which can be defined using an ontology, which facilitates use of business terms in queries [14].
(Self-describing data).
The system must store metadata along with the instance data in order to facilitate interpretation. As a general trait of semantic systems, definitions of classes and properties are a flexible part of the data, not encoded in the system’s physical schema. The RDF data format, for example, serves for the representation of instance data and class-level data alike. Changes in the data model do not culminate in a refactoring of the database schema in the same way as in relational databases. Furthermore, query operations may directly operate on the metadata. A query operation may, for example, obtain an overview of relationships where individual entities are replaced by their classes.
(Modularization).
The system must allow for the modularization of knowledge according to different criteria. Knowledge that belongs together can be represented together. Separate knowledge with different scopes can be split into multiple modules. For example, some knowledge, e.g., a particular runway closure, may be relevant for LOWW airport while other knowledge, e.g., reduced quality of a specific navigation aid, may be relevant for LOWL airport.
(General and specific knowledge).
The system must support both modules with more general scope and modules with more specific scope. For example, in some cases, associating particular knowledge with individual geographic segments is impossible. Instead, the knowledge could be relevant for the entire LOVV region and not only LOWW airport. When selecting relevant knowledge for the LOWW airport, however, relevant knowledge for the entire region should also be included.
(Knowledge selection and combination).
The system must provide query operations for selecting and combining knowledge from different modules. For example, in a pilot briefing, depending on the flight phase, knowledge with a specific geographic applicability is relevant, which must be selected from multiple modules.
(Knowledge abstraction).
The system must provide query operations that allow to obtain a more abstract view on the represented knowledge. For example, in some situations, the knowledge about various layers of dry snow, compact snow, etc. on different runways of an airport, may be accurately summarized by the fact that the runways at LOWW airport have snow contamination.
In the following, we formulate the KG-OLAP cube model and query operations guided by the identified functional requirements.
Multidimensional model
In this section, we introduce the
KG-OLAP cube model
KG-OLAP adapts the multidimensional modeling paradigm from data warehousing (see [83]) in order to organize multidimensional KGs. Hence, the
A KG-OLAP cube’s upper layer defines the multidimensional structure of a cube and associates specific knowledge modules with individual cube cells. Intuitively, the cube’s Alternatively, data warehouse literature refers to those points as
Fig. 3 illustrates, in Dimensional Fact Model (DFM) notation [30], a KG-OLAP cube’s schema with its dimensions and levels. The presented KG-OLAP cube organizes relevant knowledge for air traffic management (ATM). The box in the center represents the cells of the cube: each cell contains an RDF graph that encodes the contextualized ATM knowledge – the cell’s knowledge module. The cube has four context dimensions that characterize the cells:

A KG-OLAP cube with its dimensions and levels in DFM notation for the organization of KGs in air traffic management.
The dimension members (e.g., February 2020, Vienna) of a KG-OLAP cube are organized in a complete linear order, which is referred to as
Fig. 4 shows an ordering of dimension members and the corresponding levels, which is used in the running example cube of ATM knowledge. Each dimension is represented as a tree. The dimension’s name is depicted above the tree. Each node represents a dimension member, the caption next to each node shows the respective dimension member’s name. An edge between two nodes represents a roll-up relationship between the respective dimension members, from bottom to top. On the left hand side of each tree are the levels of the dimension members, ordered from most general to most specific. Each dimension has an implicit

Example hierarchies for the context dimension members.

The dimension members characterize the cells of a KG-OLAP cube: each cell has a set of dimension members as identifying attributes and the dimension hierarchies organize the cells into a hierarchical structure. For example, the combination of dimension members February 2020 and Austria identifies a particular cell in a two-dimensional cube with time and location dimensions. The hierarchical order of dimension members then determines the
Fig. 5 shows a set of cells according to the KG-OLAP cube schema in Fig. 3; the contents of the knowledge modules are shown in Fig. 6 (see Example 4). The
The
The
A KG-OLAP cube’s lower layer consists of the actual knowledge modules that are associated with the individual cells. A knowledge module contains statements valid in the context of the associated cell. The knowledge inside each module is specified using an

Example contents of the KG-OLAP cube cells in Fig. 5.
Fig. 6 defines (using DL syntax) contents for the knowledge modules
The
The
In some cases, a
A
The
A navigation aid (
The
The
The
The
The
The
The
Example 4 illustrates how knowledge representation in KG-OLAP fulfills the functional requirements defined in the previous section. First, the represented KGs are heterogeneous (Requirement 1) regarding both the types of entities and relationships; schema variability is exemplified by the different variants of
The knowledge from higher-level cells propagates to the covered lower-level cells; the knowledge associated with the lower-levels cells also specializes the more general knowledge inherited from the higher levels. The hierarchical organization facilitates the combination of knowledge across cells in the course of data analysis: the higher-level facts contain a shared conceptualization of business terms that may be extended by lower-level facts. The actual contents of lower-level cells are defined in terms of the shared conceptualization provided by the higher-level facts. The propagated knowledge is also available for reasoning.
(Inference).
Consider the cells in Fig. 5 and the corresponding knowledge modules in Fig. 6. In the
Formalization
In the following, we adapt and extend the definitions of the CKR framework – building on the CKR definition [8,12] in a generic description logic (DL) language [4] – in order to fit the needs of KG-OLAP and its query operations (see Section 4).
Basic definitions
We first define the basic notions of a KG-OLAP cube before relating the KG-OLAP cube definitions to the CKR framework. The multidimensional structure is expressed using a
For every dimension
In order to define the hierarchical order of cells, we adapt the definition of dimensional vector and context coverage from the original CKR definition [75]. Let
Given a dimensional vector, we associate with that vector a
Context coverage derives from the order of the dimensional vectors associated with the contexts, which in turn derives from the individual dimension orders. Let
The knowledge represented in each cell is expressed in a DL language
Extending the CKR framework
We now define a KG-OLAP cube as a special kind of CKR with hierarchically-ordered dimensions and cells as well as knowledge propagation from higher to lower-level cells. In the following, in order to formalize the semantics of a KG-OLAP cube, we employ the OLAP cube vocabulary Ω as a CKR meta-knowledge vocabulary and extend the definitions of the CKR core. In particular, we extend the definitions of the CKR as presented by Bozzato and Serafini [12] – which has the advantage over the original CKR formulation [75] that it can be implemented as forward rules – to express propagation of knowledge modules along the coverage relations, and to allow contexts with empty knowledge contents. Further extending the definition of CKR in [12], we adopt the notion of the contextual structure being defined by contextual dimensions from the original CKR formulation [75], and we define knowledge propagation as inheritance of knowledge modules. We provide only basic definitions of the CKR core and refer to previous work [12,75] for an exhaustive presentation of the CKR framework.
A CKR is a two-layered structure composed of (1) the
The meta-knowledge of a CKR is expressed in a DL language containing the elements that define the contextual structure. A
The
(Contextualized Knowledge Repository).
A
In the following we call
The CKR semantics basically follows the two-layered structure of the CKR framework: a CKR interpretation is composed by a DL interpretation for the global context and a DL interpretation for every context.
(CKR interpretation).
A for every
The interpretation of ordinary DL expressions in
(KG-OLAP cube model).
A CKR interpretation for for for if for if
Intuitively, while the conditions (i) and (ii) of Definition 3 impose that
Given a CKR
Reasoning in KG-OLAP cubes
While the definitions for KG-OLAP cube and CKR are independent of any specific DL language used at the meta and object levels, we formalize instance-level reasoning inside a KG-OLAP cube using a
Intuitively, the materialization calculus is based on a translation to Datalog. The axioms of the input cube
With respect to the calculus for
Then, the translation procedure is extended from the one presented for CKR [12] by introducing new steps in which the cell coverage relation is computed from the dimensional coverage in the global program (see Section 3.3 of the Appendix [70]).
The rules and the translation process constitute a sound and complete calculus for instance checking in KG-OLAP cubes using
The presented KG-OLAP model can be extended in several directions in order to increase flexibility. In the following, we briefly identify possible extensions.
As in the underlying model of CKR, the KG-OLAP model assumes a centralized view on the available data: The structure of the cube and the knowledge associated to each cell is assumed to be locally available in a single knowledge base for the application of reasoning and operations on the cube. Thus, a possible extension, supported by the modular nature of the cube model, concerns the distribution of the cube structure over separate knowledge bases. This would require to revise the formalization so that both the meta and object information are distributed across more local knowledge bases and knowledge is propagated also considering the topology of the distributed knowledge bases (e.g., by introducing an additional “distribution” dimension). The reasoning procedures and OLAP operations on such distributed system have to be consequently adapted to the distributed structure and the intended knowledge propagation across local knowledge bases. On the other hand, the modularization of knowledge bases can represent an advantage to limit the load of reasoning tasks and to restrict application of operations to a limited subset of cells.
In the proposed KG-OLAP model, we assume that all the dimensions and dimensional values are known a priori and thus no context is undefined (but may have an empty module). The KG-OLAP model could be extended to consider updates of dimensions and dimensional values, which necessitate the dynamic reorganization of knowledge in the contexts that are identified by the newly introduced dimensional values. Updates of the dimensions and their dimensional values will typically be limited to extending the hierarchy with new branches rather than modifying the existing dimensional values. In this regard, however, the notion of “slowly-changing dimensions” from traditional data warehousing (see [83, pp. 139–145] for further information) may be adopted in KG-OLAP. Furthermore, in some cases, it may not be possible to determine the right dimensional value for the statements. The KG-OLAP model allows for the introduction of “unknown” or “not applicable” members in the dimensions. For example, there could be a cell containing knowledge of unknown importance, a cell containing knowledge of unknown temporal or spatial applicability. Future work may investigate the different notions and implications of such default members in the dimension hierarchies.
We note that, due to the modular structure of the model, concepts in different cells can assume different interpretations: the particular situation identified by a cell can determine the local meaning of a concept, reflecting the contextual view of the model. In this regard, we refer to the
The concepts that are propagated from the higher to the lower cells are assumed to be refined by the knowledge in the lower cells. Such refinement is necessarily monotonic, as the knowledge in the lower-level cells cannot truly “override” the concept definitions from the higher-level cells. In this direction, in the CKR model, a notion of
For the presented definition of the KG-OLAP model we consider a simple hierarchical organization of dimensional values, which can be divided into levels, i.e.,
Query operations
In this section, we introduce a set of query operations for working with KG-OLAP cubes. We distinguish between
Contextual operations
The contextual operations select and combine cells of a KG-OLAP cube, thus satisfying Requirement 6, using its dimensions and levels. The
Slice and dice
The
(Slice and dice).
Given a cube In the definition of operations, for simplicity of notation, we assume that components of the cube For each
Fig. 7(a) depicts the definition of slice-and-dice operation on a one-dimensional cube. Intuitively, the slice-and-dice operation takes as argument the coordinates of a point in the cube, i.e., a dimensional vector

Illustration of contextual operations definitions.

Applying slice-and-dice and merge operations on the KG-OLAP cube instance from Fig. 5. Gray lines denote contexts that are disregarded by the slice-and-dice operation
Fig. 8 illustrates the application of the slice-and-dice operation on the KG-OLAP cube from Fig. 5, which we denote by
In data warehousing, conceptual multidimensional models typically distinguish between dimensional attributes and non-dimensional attributes (see [30]). While the dimensional attributes identify the cell, non-dimensional attributes provide additional information that can be used for selection. Similarly, KG-OLAP cubes could be extended with non-dimensional attributes to allow for additional variants of the slice-and-dice operation.
Merge
The
Formally, we define a
Let
(Merge).
Given a cube For each For each
Fig. 7(b) illustrates the definition of the merge operation. Intuitively, the merge operation is a transformation over the original cube that combines the knowledge from lower-level cells into higher-level cells in the contextual hierarchy (by adding a new module
(Merge).
In Fig. 8, the
In the case of general context coverage hierarchies that do not follow a clear notion of granularity level, the merge operation, which is now defined in terms of levels, requires reformulation. Moreover, the combination method in the merge operation – which in its current form simply considers either the union or intersection of the knowledge from the contexts in lower levels – can be refined to consider different methods for selection of knowledge in the newly generated module, e.g., by considering ontology merging methods that suit the use case at hand. We note that in the merge operation we do not explicitly consider the management of RDF blank nodes since we abstract from the serialization of the knowledge. We assume that blank nodes have been skolemized before the application of the merge operation; another option is to consider methods for merging RDF blank nodes in the resulting graph [36, #shared-blank-nodes-unions-and-merges].
Graph operations
Graph operations – abstraction, pivoting, and reification – alter the structure of the KGs inside the knowledge modules of a cell, thus satisfying Requirement 7. Abstraction replaces sets of entities with individual and more abstract entities. Pivoting moves metaknowledge (contextual information) inside the modules. Reification allows to represent relations as individuals.
Abstraction
Abstraction serves as an umbrella term for a class of graph operations that, broadly speaking, replace entities in an RDF graph with more abstract entities. This abstraction is based on various types of ontological information, e.g., class membership and grouping properties. We also refer to abstraction as
We distinguish three types of abstraction: (a)
Consider the set of asserted and inherited modules of a cell
(Abstraction).
Given a cube for every role assertion for every role assertion for every for every role assertion resp. for every add to remove every
Note that for simplicity we treat literal values as individuals and we do not distinguish roles across individuals and values in our language. We note that

Illustration of abstraction operations definitions.
Fig. 9 shows a graphical representation for definitions of the abstraction operations. Intuitively, the abstraction operation takes as input the single cell, on the knowledge module of which the operation is applied, a class
The definition of individual-generating abstraction can be easily extended to allow for multiple grouping properties. For example, in individual-generating abstraction, given grouping relations

Examples of triple-, individual-, and value-generating abstraction.
Fig. 10 illustrates the different variants of abstraction on the running example of ATM knowledge graphs. Take the
The grouping role
Additional variants of the abstraction operation can be defined in order to adapt the concept of abstraction to the structure of the object knowledge. For example, in individual-generating abstraction, we consider an explicit
We note that KG-OLAP, in general, aims at being independent from the specific ontology language chosen for the representation of local axioms and assertions. If an ontology language were employed that could express the semantics of the abstraction operations (or some variant), that ontology language, in conjunction with an automated reasoner, could serve to implement the abstraction operations. Regardless, in the case of OWL, reasoning alone cannot serve to implement the abstraction operations. For example, in case of individual-generating abstraction, since OWL does not support grouping of individuals by property value when that value is a variable, extralogical steps must be followed in addition to reasoning. In Example 8, membership reasoning for the complex concept
Pivoting
The
(Pivoting).
Given a cube for every
Note that we have to admit that

Illustration of the pivoting operation definition.
Consider the
Reification
The
Consider the set of asserted modules of cell
(Reification).
Given a cube a module name
Illustration of the reification operation definition. a concept for every

Fig. 12 shows an illustration of the reification operation definition. Intuitively, the reification operation considers a single cell Consider the
Proof-of-concept prototype
In this section we sketch the foundations of a proof-of-concept prototype of a KG-OLAP system implemented on top of off-the-shelf quad stores. We evaluate the prototype’s performance in order to grasp the complexity of KG-OLAP cube maintenance and query operations as basis for future optimization efforts; performance optimization is not a goal of this paper. We refer to Section 4 and Section 5 of the Appendix [70] for additional information on implementation and performance evaluation, respectively. Source code and logs from the experiments are available in an online repository.3
A mapping of the formal language to an actual RDF representation allows for the storage of KG-OLAP cubes in off-the-shelf quad stores with SPARQL realizations of the query operations. Context-aware rules serve to materialize roll-up relationships for levels and cells as well as inference and propagation of knowledge. Details about logical model, reasoning and queries are provided in Section 4 of the Appendix [70].
Performance evaluation
In the following, we analyze performance of the core set of KG-OLAP cube operations – i.e., slice-and-dice, merge union, and abstraction. Specifically, we look at median run times for the computation of the query operations’ delta statements, i.e., the statements that must be inserted or deleted in order to perform the respective query operation, measured over multiple iterations, relative to repository size (number of statements), context size (number of contexts), and delta size (number of computed delta statements). We do not include the duration of actual insertions or deletions of the delta statements in the run times since these are not specific to contextualized KGs.
The performance experiments employed synthetic datasets based on the ATM scenario used throughout the paper, which allowed us to vary the number of dimensions, contexts, and statements while keeping the graphs similar. The generated contexts contain knowledge about airports, runways and taxiways, warnings, temporary closures of runways and taxiways for different aircraft characteristics based on weight or wingspan, layers of surface contamination with the depth of each layer, and navigation aids with their frequencies. Query operations ran on three-dimensional and four-dimensional datasets with 1365, 2501, and 3906 contexts, respectively. The contexts were not all on the same granularity but distributed over five different granularity levels (not including the root context), where the number of contexts per granularity level gradually increased with the depth; the majority of the contexts were on the most specific granularity level (see Section 5.2 of the Appendix [70] for details). Some entities were shared between contexts at different granularities to accurately study the effect of knowledge propagation. Context size was varied by adding/removing branches in the context hierarchy. In turn, for each dimensionality and context size there were three different repository sizes. In order to make for more realistic datasets, repository size was varied by increasing the number of airports, runways and taxiways, warnings, etc. per context, dependent on the granularity level, not by randomly distributing statements across contexts, which due to the structure of the context hierarchy results in differences in the repository sizes between context sizes. Additional “baseline” datasets were used in the experiments involving abstraction operations, which consisted of a single context, in order to investigate the impact of contextualization on query processing.
We remark that designing performance experiments after existing benchmarks for traditional OLAP, e.g., TCP-DS,4

Performance of contextual operations.
The performance experiments were conducted on a virtual CentOS 6.8 machine with four cores of an Intel Xeon CPU E5-2640 v4 with 2.4 GHz, hosting a GraphDB5
The GraphDB instance comprised two repositories – base and temporary – with the following configuration (see [63] for further information). The entity index size was 30000000 and the entity identifier size was 32 bits. Context index, predicate list, and literal index were enabled. Reasoning and inconsistency checks were disabled; the KG-OLAP implementation takes care of reasoning via RDFpro rule evaluation.
Fig. 13(a) shows run times of the slice-and-dice operation. The plot on the left shows run time relative to repository size per dimensionality. The plot in the middle shows run time relative to repository size per context size. The plot on the right shows run time relative to the size of the delta table computed by the query operation. Hence, performance of the slice-and-dice operation primarily depends on the number of delta statements in the query result, i.e., the number of selected cells/statements from the base repository. In fact, in this example, the slice-and-dice operation performs better on the large context size (3906 contexts) due to fewer delta statements being computed because of the distribution of the statements across the contexts. Dimensionality does not play a role here. In summary, slice and dice does not involve any complex computations, consisting only of the selection of relevant contexts and their contents.

Performance of abstraction operations.
In case of the merge operation, in order to study worst-case performance, we deliberately chose dataset and formulation of the merge operation such that the result would be an almost complete reorganization of the contexts in the KG-OLAP cube. The lower-level cells contain the majority of statements, which are all affected by the merge operation. The complexity of the chosen operation is evidenced by the delta size, which is about twice the repository size: the merge operation first removes the statements from the merged cells only to insert those statements again into higher-level cells. The run time of the merge union operation, as shown in Fig. 13(b), primarily depends on the number of contexts. For each context size, we observe a linear increase in run time with respect to the repository size. For the large context size (3906 contexts) there is also a marked influence of dimensionality on run time.

Performance of rule evaluation.
For performance evaluation of the abstraction operations, we employ, on the one hand, baseline datasets which consist of a single context comprising all statements in order to gain an understanding of the inherent complexity of these operations regardless of contextualization. Such abstraction operations on a single context correspond to the formalization. On the other hand, we perform abstraction operations in a variant that performs the abstraction to each cell at a particular granularity level. Similar to the merge operation, we deliberately choose a setting where the query operations affect a large number of statements in order to study worst-case performance.
Fig. 14(a) shows run times of triple-generating abstraction. Run time grows linearly with repository size. Run times for individual-generating abstraction with a single grouping property are similar, as shown in Fig. 14(b). For triple-generating abstraction, the baseline datasets had significantly higher run times. The difference was less pronounced for individual-generating abstraction.
For value-generating abstraction, the queries resulted in smaller sizes of the computed delta tables. Fig. 14(c) shows that the run time of value-generating abstraction grows about linearly with respect to repository size with a small influence of context size and little influence of dimensionality. Run times for the baseline datasets were smaller. We note that the run times of value-generating abstraction were in general quite low.
For results of reification and pivoting operations we refer to Section 4.5.6 and Section 4.5.7, respectively, of the Appendix [70]. In summary, the reification operation grows linearly with the repository size. For the pivoting operation we observe context size as the main factor influencing run time.
The proof-of-concept implementation relies on materialization of coverage relationships and inferences, which requires evaluation of a set of rules over the base repository in order to initialize the KG-OLAP cube. Fig. 15 shows run times for the evaluation of different rulesets relative to the repository sizes before and after reasoning – input and output repository size, respectively – as well as number of contexts for three-dimensional and four-dimensional KG-OLAP cubes. Reasoning in the performance experiments was conducted with the entire repository loaded into main memory first. Performance of two different rulesets was evaluated. The first ruleset, at the local level, i.e., within each context, performs membership reasoning under consideration of subclass relationships. For example, if
The purpose of the implementation was to get an idea of the complexity of KG-OLAP cube maintenance and query operations. Even though optimization was not a main goal of this paper, performance is already more than satisfactory for certain use cases. In order to handle big KGs, however, future work will consider a distributed, parallelized implementation. Nevertheless, optimization requires a thorough definition of the framework’s fundamental concepts, which we provide in this work.
In the following, we briefly discuss the implications of the results of the performance evaluation regarding the use cases presented in Section 2.
(Pilot briefings).
Performance of the proof-of-concept prototype is more than satisfactory for use in pilot briefings. In that case, a KG-OLAP cube would cover the pilot briefing for a particular flight. Knowledge would consist, on the one hand, of relevant baseline information, e.g., available runways and taxiways, route of the flight, and navigation aids. On the other hand, temporary knowledge would be extracted from DNOTAMs and METARs. A typical short-distance flight has fewer than 200 relevant DNOTAMs [79, p. 6B2-10]. Assuming that it takes 10 triples to represent a single DNOTAM, the knowledge extracted from DNOTAMs amounts to 2000 triples for a short-distance flight. Even when accounting for other potentially relevant knowledge, and assuming a long-distance flight, the KG for a pilot briefing for a single flight will be in the range of at most 100000 triples, which is well within the capabilities of the proof-of-concept prototype and would also allow for more sophisticated reasoning. The expected number of contexts for a single flight, even with hourly or half-hourly granularity of contextualization, assuming tens of geographic segments, is within the capabilities of the prototype implementation.
(Post-operational analysis).
Maintaining a large KG-OLAP cube for post-operational analysis in Europe or the United States, or even in a single region within one of those areas, with possibly tens of thousands of contexts and billions of triples, would require a different implementation strategy. Taking the figures regarding ATMGRAPH’s size as reference, the volume of knowledge for only the New York area generated within a single month amounts to 260 million triples [44]. An alternative to maintaining a single KG-OLAP cube would be the adoption of a federated approach, maintaining multiple cubes within a
Related work
In this section we review related work on contextualization in KGs, on OLAP and semantic web technologies, on OLAP and information networks, and on (knowledge) graph summarization.
Contextualized knowledge graphs
The choice of CKR as base formalism for the definition of the KG-OLAP cube model was motivated by the explicit representation of dimensions and the organization of dimensions into levels which, as discussed in the previous sections, is compatible with the multidimensional perspective of OLAP. The idea of associating contexts with dimensions traces back to the theoretical works of Lenat [53] and the “context as a box” paradigm from Benerecetti et al. [7] for representation of contexts in knowledge representation.
The problem of annotation and contextualization of knowledge in KGs has been present since the early years of semantic web technologies. Different ways of introducing a notion of modularization in the RDF data model or KG implementations have been proposed [16,38,45]. Such approaches, however, often do not provide a formal (logic-based) definition of the resulting contextual model, which goes against Requirement 2 (ontological knowledge).
The need for formalization of the notion of context in KGs led to the proposal of several (description) logic-based approaches for the contextualization of knowledge bases: apart from the CKR [12,75], we can cite [46,80,82,92], for example. While these models explicitly introduce contexts as a primitive in knowledge bases, these proposals do not always take into account an explicit representation of the notion of dimensions and/or level hierarchies, which is important – as of Requirement 4 for modularization and Requirement 5 for hierarchical decomposition of KGs into more general and more specific knowledge along with knowledge propagation – to qualify the different situations represented by contexts, and to have a clear separation of knowledge bases associated with each context.
Krötzsch et al. [48] propose an approach for adding annotations in a logic-based reading of KGs. While that model formally introduces annotations to local description logic axioms and assertions, it does not provide definite criteria for knowledge propagation across contextual structures as in the KG-OLAP coverage hierarchy (cf. Requirement 5, general and specific knowledge).
OLAP and semantic web technologies
Semantic technologies have been used for a variety of tasks in the context of OLAP (see [3] for an overview). Related to KG-OLAP are techniques for data analysis over RDF data. The RDF data cube vocabulary (QB) [24] and its extension, QB4OLAP [26], provide an RDF representation format for publishing traditional OLAP cubes with numeric measures on the Semantic Web, with often SPARQL-based operators that emulate traditional OLAP queries for analyzing multidimensional data in QB [57] and QB4OLAP [27,84]. Such statistical linked data are just different serialization and publication formats of traditional OLAP cubes. Other work has suggested “lenses” over RDF data [20] for the purpose of RDF data analysis, i.e., analytical schemas which can be used for OLAP queries on RDF data. Similarly, superimposed multidimensional schemas [37] define a mapping between a multidimensional model and a KG in order to allow for the formulation of OLAP queries. Contrary to these approaches, KG-OLAP focuses on RDF graphs as the “measures” of OLAP cubes rather than numeric measures that are aggregated using aggregation operators such as
Fusion cubes [2] supplement traditional OLAP cubes with external data in RDF format, particularly linked open data where typically the data are not owned by the analyst. Fusion cubes are traditional OLAP cubes with numeric measures that can be populated dynamically with statistical data from RDF sources. Fusion cubes store contextual information about provenance and data quality of the external sources. Other similar work [59] extracts traditional OLAP cubes with numeric measures from RDF data sources and ontologies, which analysts may then query using a traditional OLAP language, namely MDX. The Semantic Cockpit project [60] employed ontologies for the definition of a shared understanding of business terms and analysis situations among business analysts. With respect to these approaches, KG-OLAP cubes may have some similarities to a structured data lake (see [69] for more information on data lakes), which stores the data of interest in a semantically richer format than plain numeric measures, but unlike conventional data lakes, which store raw data, provides a certain degree of integration and cleaning as well as dedicated query operations.
OLAP and information networks
KG-OLAP is related to applications of OLAP technology for analysis of information networks. Among the first, and arguably the most prominent, of these approaches was Graph OLAP (also known as InfoNetOLAP) [18,19], which through its informational and topological OLAP queries provides rich query facilities suitable for graph analysis. In Graph OLAP, graphs are associated with dimensional attributes, which yields a graph cube, i.e., the contents of the cube cells are graphs. The edges of the graphs themselves are weighted; the weights represent the measures to be analyzed. Typical applications of Graph OLAP are analysis of co-author and similar social graphs from different time periods, geographic locations, and so on. Graph OLAP distinguishes between informational roll-up and topological roll-up, which corresponds to the distinction between contextual and graph operations in KG-OLAP. The focus of Graph OLAP are weighted directed graphs with highly structured and homogeneous data. Hence, Graph OLAP does not consider heterogeneity (Requirement 1) and ontological knowledge (Requirement 2). Schema information, e.g., about roll-up hierarchies, are external to the data, i.e., graphs in Graph OLAP are not self-describing (Requirement 3), resulting in rigid, inflexible graph schema and queries. Graph OLAP also does not consider the systematic propagation of knowledge from more general to more specific contexts (Requirement 5); graph data are only associated with the most specific granularity in the cube.
Since the original proposal of Graph OLAP, other approaches that apply OLAP to information networks have been proposed, which can be compared along different criteria, most notably the type of network, i.e., homogeneous or heterogeneous [67]. KGs have been likened to “schema-rich” heterogeneous information networks [77] and, therefore, only approaches applying OLAP to heterogeneous networks can be accurately compared to KG-OLAP. Yet, even approaches applying OLAP technology to heterogeneous information networks, e.g., [29,55,88,89,91], do not consider schema variability, i.e., Requirement 1(c), or ontological knowledge (Requirement 2). Furthermore, unlike KG-OLAP, approaches to OLAP on information networks also lack a systematic way of structuring large bodies of knowledge, which comes with the decomposition of large knowledge graphs into hierarchically structured modules, from more general to more specific, in conjunction with knowledge propagation (Requirement 5). Schema and instance data are also firmly separated, i.e., the data are not self-describing (Requirement 3), reducing flexibility of the query operations. Another approach [56] aims to reconcile statistical/multidimensional linked data in QB and QB4OLAP with the informational/topological view of OLAP on information networks. Regarding the requirements, that approach does not consider schema variability, i.e., Requirement 1(c), ontological knowledge (Requirement 2), distinction between more general and more specific knowledge in conjunction with knowledge propagation (Requirement 5), and knowledge abstraction (Requirement 7). Some approaches, e.g., [88], offer abstraction operations that are similar to the KG-OLAP operations, but less flexible due to lack of self-describing data (Requirement 3), which allow for a more flexible formulation of roll-up paths within the graph structure. Furthermore, KG-OLAP allows for the use of complex concepts and roles in abstraction (in line with Requirement 2, ontological knowledge), allowing for flexible query formulation. For example, individual-generating abstraction with complex concepts such as
Techniques for optimization of OLAP on information networks may inform the optimization of KG-OLAP. Precomputation and materialization of aggregate views [88,89] is the most prevalent optimization technique for OLAP on information networks, which may not be applicable directly to KG-OLAP due to the heterogeneous data and highly flexible query formulation in KG-OLAP. Adopting a MapReduce-based implementation like Pagrol’s [89] or Process OLAP’s [6] is an interesting perspective for KG-OLAP as well.
Work on the analysis (or mining) of heterogeneous information networks [76] and Semantic Web databases [51] is orthogonal to KG-OLAP. Such approaches calculate, for example, degree distribution or PageRank, but may also serve to predict links and as the basis for recommender systems. Just like data mining may use OLAP cubes, information network mining may use KG-OLAP cubes.
(Knowledge) graph summarization
The KG-OLAP query operations also invite comparison with
Among the structural summarization approaches for RDF graphs, quotient RDF summaries represent a type of summaries that produce an RDF graph where multiple nodes from the source graph are replaced by a single summary node in the RDF summary. Accordingly, the results of abstraction operations in KG-OLAP may be considered
Unlike KG-OLAP, existing work on graph and KG summarization largely ignores modularization and contextuality in KGs (Requirement 4). In fact, existing work on KG summarization is orthogonal to the KG-OLAP approach. Consequently, future work may adapt summarization algorithms to serve as graph operators in KG-OLAP.
Conclusion
In this paper, we presented KG-OLAP for the analysis of knowledge represented in KGs. We extended the multidimensional modeling paradigm from OLAP to the representation of contextualized KGs. Each cell (or context) in a KG-OLAP cube contains knowledge triples. We introduced specific query operations for KG-OLAP cubes. On the one hand, contextual operations allow for selecting and merging contexts. On the other hand, graph operations allow for summarizing knowledge triples within individual contexts. We illustrated KG-OLAP with use cases in air traffic management [72,73]. A proof-of-concept prototype using off-the-shelf quad stores and SPARQL queries demonstrates feasibility. Using the prototype, we conducted experimental performance evaluation, which may inform future optimization efforts. Even though optimization was not a goal, we conclude that there is a class of use cases for which the proof-of-concept implementation’s performance is more than satisfactory.
Different directions can be investigated for future work. We aim at extending the formalism for a distributed approach with federated KG-OLAP cubes. We sketch a first version of a distributed, parallelized KG-OLAP framework in [11]. Regarding implementation, we aim at realizing a distributed, parallelized implementation of a KG-OLAP system including the concept of metacube and the drill-across operation [73], in order to support big KGs. We are interested in extending the KG-OLAP model with defeasible axioms similar to previous work on CKR [8,13]. Regarding the KG-OLAP operations, different variants can be studied to meet the needs of the use cases at hand and the extensions of the underlying model (e.g. by distribution). Moreover, we may extend the proposed definitions of query operations with common RDF summarization techniques.
