Abstract
In this paper, we propose to extend SPARQL functions for querying Industry Foundation Classes (IFC) building data. The official IFC documentation and BIM requirement checking use cases are used to drive the development of the proposed functionality. By extending these functions, we aim to (1) simplify writing queries and (2) retrieve useful information implied in 3D geometry data according to requirement checking use cases. Extended functions are modelled as RDF vocabularies and classified into groups for further extensions. We combine declarative rules with procedural programming to implement extended functions. Realistic requirement checking scenarios are used to evaluate and demonstrate the effectiveness of this approach and indicate query performance. Compared with query techniques developed in the conventional Building Information Modeling domain, we show the added value of such approach by providing an application example of querying building and regulatory data, where spatial and logic reasoning can be applied and data from multiple sources are required. Based on the implementation and evaluation work, we discuss the advantages and applicability of this approach, current issues and future challenges.
Introduction
As integrating data in the architecture, engineering and construction (AEC) industry is becoming increasingly important [46], Building Information Modeling (BIM) has been adopted by a growing number of industry practitioners and has led to the specification and standardization of the data standard Industry Foundation Classes (IFC) [15,26]. Using BIM applications and the IFC standard to create, exchange and process building-related data is the state-of-the-art in the AEC industry’s day-to-day operations. Even using IFC-based instance building models, however, the retrieval of domain specific information is currently challenging for industry practitioners, who are generally depending on proprietary, vendor-specific solutions. Building models are used for different engineering tasks, where information needs to be flexibly derived according to a wide range of use case requirements. However, the IFC data model is designed for the creation and exchange of product data, but not tailored for various query and analysis tasks [48]. Many useful relationships and properties that are explicitly defined or implied in building models are difficult to retrieve in day-to-day processes. Furthermore, the IFC data is limited by its schema which is not flexible enough to adapt to situations when data from different sources needs to be integrated and processed [4,41]. Although IFC is a data model aiming to cover the entire AEC industry, much information used in common industry scenarios is not specified within the scope of the IFC data model, including e.g. product classifications, building requirements and regulations as well as data from neighboring domains such as urban planning and sensor networks.
Using the Resource Description Framework (RDF) and Semantic Web technologies to represent building data has been proposed time and again over the last decade [4,40,45]. Unlike conventional data modeling approaches that are limited by the scope of their underlying schemas, these Semantic Web technologies provide an open and common environment for sharing, integrating and linking data from different domains and databases. Semantics can be formally defined with the logic basis of these technologies and shared using web-based mechanisms such as Uniform Resource Identifiers (URIs) and the Hypertext Transfer Protocol (HTTP). The ifcOWL ontology has been developed as a counterpart of the IFC data model using the Web Ontology Language (OWL) and RDF. The ifcOWL ontology is in the final stages of the standardization process driven by the buildingSMART organization, the most important industry standardization body and forms the foundation for Semantic Web applications for the AEC domain [40]. By transforming IFC instance building models to RDF data that follows the ifcOWL ontology, using a standard query language such as SPARQL to process them becomes possible [19].
By using plain SPARQL1
In this paper, plain SPARQL refers to SPARQL queries that are compliant with the W3C Recommendation SPARQL 1.1.
There are currently three major components of the BimSPARQL project presented in this paper: (1) A set of functions modelled as RDF vocabularies that can be used in SPARQL queries (see Section 4); (2) A set of query transformation rules to map functions to IFC data structures to make writing queries easier (see Section 5); (3) A module for implementing geometry-related functions for deriving implicit information (see Section 5). The official IFC specification and BIM requirement checking use cases in the Netherlands and Norway, and some checks that have been implemented in Solibri Model Checker (SMC) [10,47,50,52] have been used to drive the development of the proposed and implemented functionality. The links to the vocabularies, transformation rules and source code repository of the prototypical reference implementation are provided in Appendix A.
The extended functions in this research do not require extensions for the grammar of SPARQL. With SPARQL as a common interface language, extended functions can be used to query building data alone or combined with data from other sources, which in turn may have their own domain specific functions (Fig. 1). We believe that this is a generic approach that is usable in many different use cases, including e.g. multi-model collaboration, quantity take-off and cost estimation, requirement and code compliance checking etc.. As a W3C standard, SPARQL has been widely implemented by a plethora of RDF Application Programming Interfaces (APIs) and databases, and there are many of them support extending functions (e.g. see Section 3.3 and Section 5), hence can be used as base environments for implementing extended functions.

SPARQL query with domain specific functional extensions.
This paper is structured as follows: In Section 2, the background of IFC and ifcOWL is briefly introduced and the motivation of this research is elaborated. In Section 3, an overview of related research is provided. The proposed functional extensions for SPARQL are introduced and classified in Section 4, followed by example use cases. In Section 5, implementation methods are described and a prototype is presented. In Section 6, realistic use cases put forward by research communities and building models are used to evaluate this prototype and demonstrate the value of this approach in comparison with SPARQL and exiting work. In Section 7, an extended example is presented to show the extensibility of this method. A discussion about added value, limitations and further work concludes this paper.
In the last two decades, the IFC standard has been developed and maintained by buildingSMART as a standard data model for data exchanges between heterogeneous applications in the AEC/FM sector [10,15]. The IFC schema is specified using the EXPRESS modeling language [24], while its instances are usually serialized in IFC STEP File format [25]. The comprehensiveness of the AEC domain makes IFC one of the largest EXPRESS-based data models across engineering industries. It provides a wide range of constructs for modeling building-related information. For example, one of the most recent versions, IFC4_ADD1, defines 768 entities and 1480 attributes on the schema level [10]. IFC has also provided a few mechanisms to extend semantics in the instance level including e.g. common property sets and external standard classification references. However, there are limited rules specified in IFC about usage of these constructs and mechanisms. On the other hand, the semantics required in the AEC industry are much more than all the available concepts formalized in the IFC data model. Therefore, a large amount of information is informally or implicitly represented with various ways and usually causes redundancies and ambiguities in IFC instance models [53]. As an object-oriented data model, IFC structures data mainly for the purpose of data exchange rather than for the understanding of the knowledge domain, and information is usually represented using relatively complex structures. From technical perspective, furthermore, the EXPRESS language family has not gained popularity outside the STEP initiative in either engineering or software development communities, and there is a very limited set of tools to support storage, query and management for data in the IFC native format. All these issues have brought about difficulties regarding data query and management of IFC instance data.

Query to retrieve building elements which are not contained in a building storey. The query result can be used to check the spatial containment relationship for every building element
Converting the IFC schema and its instances to OWL and RDF was firstly proposed and implemented in [4] to facilitate use cases of data partition, data query and knowledge reasoning. It has been further developed by the buildingSMART Linked Data Working Group (LDWG) and has been specified as candidate standard status in 2015 [40]. Using inferencing and reasoning capabilities of RDF(S) and OWL, practical data processing scenarios in the building industry can be addressed with off-the-shelf algorithms and tools that would require custom tools using STEP-based modeling technologies. For example, a simple data validation use case requires that every building element should be associated with a building storey, can be implemented without hardcoding procedural validators [52,56]. The relationship between a building element and the related building storey can be defined using an instance of IfcRelContainedInSpatialStructure, which is an objectified relationship defined in IFC. Provided that the building model is represented in the standardized ifcOWL, the query provided in Listing 1 can retrieve building elements which do not have this spatial containment relationship using common SPARQL implementations.2
In this paper, all properties defined in ifcOWL are abbreviated to compact format in query listings e.g. ifc:relatedElements is used to represent ifc:relatedElements_ IfcRelContainedInSpatialStructure, which is standardized in ifcOWL. Another simplification is that all the queries in this paper assume that they are under RDFS entailment, hence in this case all the instances of IfcBuildingElement subtypes are visited by the query.
As RDF and Linked Data have received increasing attention in the AEC industry, it makes sense to use SPARQL as a common language to process federated data sources instead of developing custom domain specific languages. Common instance model query scenarios that can be implemented using the current SPARQL specification include:
All building objects should be tagged with NL-sfb classification code, which is a building product classification system used in Netherlands. The type and thickness of walls can only be modelled according to the valid combinations provided in an external table X. Retrieve all geo-locations of companies which produce the materials used in the walls placed in space X.
All these scenarios not only need to query building models captured in e.g. IFC, but also require data from other sources. We argue that they can be more easily implemented with RDF and SPARQL technologies without relying on proprietary systems.
The conversion from IFC instances to ifcOWL RDF data is a straightforward process, and the data structures in IFC instances are reflected in the output RDF data [40]. Since standard SPARQL queries are only processed by matching the data graph patterns in RDF, the resulting queries are usually more complex than the high-level abstractions provided in use cases. For example, in the query case of Listing 1, it is better to have a shortcut relationship between a building element and a storey rather than the objectified solution of the regular schema. There are many commonly used structures that can be simplified all over the IFC data model to simplify query and make properties and relationships closer to the understanding of knowledge domains.
Another problem that motivates this research and development work is that SPARQL can hardly retrieve useful information in scenarios where geometric computations and spatial reasoning is needed. Geometry data usually constitutes the largest sections in building models (see Table 10) and contains large amounts of information that currently can only be interpreted by human domain end users. Although the IFC data model provides many ways to explicitly model geometry-related properties and topological relationships (e.g. property sets and explicit relationships such as the IfcRelContainedInSpatialStructure relationship used in Listing 1), they are not mandatory and not always reliable due to lack of rigidness in the IFC data model and the ad-hoc nature of design processes in the AEC domain. In practice, IFC building models often miss required semantic relationships and properties or contain incorrect or inconsistent information. Figure 2 shows two examples of inconsistencies between semantic relationships and geometric representations in real building models. Directly deriving information from geometric representations of building models provides another option to enrich data and ensure consistency. Furthermore, much geometry-related information is impractical or impossible to be explicitly provided in IFC data. For example, there are specific topological relationships such as the “touching” relationship between the bottom surface of a wall to the upper surface of the floor slab (see Listing 8) [50], or properties such as distances between elements (see Listing 7).

A model that has incorrect semantic information with respect to its geometric data. The left one shows two walls which are stated as “contained in” (using IfcRelContainedInSpatialStructure) in a storey are actually located on the storey above. The second one shows three walls (the light grey ones) which are labelled as “is external” are actually internal.
Across the different use cases analyzed in the context of the research presented here [10,47,50,52], there are many commonly used concepts that are frequently reused. Using the query in Listing 1 as an example, the spatial containment relationship is required in data validation use cases, and is also important in many cases including e.g. cost estimation and building code compliance checking. By wrapping them as functions used in a standard language (see Section 4), we are able to reuse them in many different applications.
BIM query techniques
Many past developments have been aimed at the query and analysis of IFC instance data. Some commercial platforms such as Solibri Model Checker (SMC) provide functions for querying IFC data [47]. However, the semantics of query functions in these proprietary systems are not transparent and the usage of them is limited by the user interfaces provided to end users.
Some researchers attempt to have use the generic Structured Query Language (SQL) to query IFC data that has been mapped into relational databases [28,30]. These attempts either have severe performance and scalability issues due to the vast amount of tables, or are not intuitive enough for end users.
BimQL is among the first implemented and open source domain specific query language for querying IFC data [35]. It is implemented in the open source bimserver.org platform [4]. It provides create, retrieve, update and delete (CRUD) functionalities to manipulate IFC data. Besides using concepts in the IFC schema, BimQL also provides a few shortcut functions for handling common use cases such as deriving information from common modeling constructs in the IFC model referred to as property sets and quantity sets. However, these functions are very limited and BimQL has not been further developed.
Geometry and spatial information in building models is especially focused by a spatial query language introduced in [8]. This approach is further developed as a query language named QL4BIM for querying IFC data [11]. It has provided a few topological and spatial operators and use R-Tree [18] spatial indexes to optimize query performance.
There are also query languages tailored for specific use cases such as building code compliance checking. The Building Environment Rule and Analysis (BERA) Language is a domain-specific language dedicated to evaluate building circulation and spatial programs [33]. For this purpose, it has defined an internal data model containing a small subset of IFC with related concepts such as floor, space and door, etc. Path-finding algorithms are developed to generate circulation routes between spaces. As a language, however, BERA has limited expressive power and only supports some specific cases on building circulation rules. BIM Rule Language (BimRL) is a more recent research project [13]. It is a domain specific query language designed to facilitate accessing information for use cases of regulatory compliance checking. BimRL has provided a suite of components including a simplified data schema and a light-weight geometry engine. IFC building models are loaded through an Extract-Transform-Load (ETL) process into data warehouse. The language has an SQL-like syntax to check building models in terms of the defined data schema and implemented functions. It is currently implemented based on a relational database.
The above technologies have provided inspiring domain specific algorithms for querying building data. Currently, however, no query language has been standardized or widely adopted by the research community and AEC industry. We argue that this might be because these technologies are limited by the closed conventional data modeling approaches that are not sustainable in the AEC domain, which continuously needs changes, extensions and customizations according to different contexts and use cases. All these domain specific BIM query languages are designed based on fixed internal data models (usually an IFC equivalent or a simplified subset of it) and additional functions are hard-wired on top of them. Although some of them have provided programming interfaces for further extensions, the development work is usually limited by the data captured in its internal data model.
Applying Semantic Web technologies for querying BIM models
In recent years, Semantic Web and Linked Data technologies have received increasingly more attention as a knowledge modeling approach in the AEC industry and a number of research prototypes have been developed. A recent and comprehensive overview of them is provided in [42]. Here, we only briefly describe cases related to data query and knowledge reasoning tasks.
Regarding data query for use cases in the AEC domain, one of the early examples is described in [55]. Conformance constraints are interpreted and formalized as SPARQL queries in that paper. A similar method is developed in [9], which has introduced a semi-automatic process to transform regulatory texts to SPARQL queries. A limitation of both efforts is that they mainly focus on formalizing building regulations into a query language without specifying on how to map the used terminologies to building data models.
A number of researchers have applied Semantic Web technologies in different sub-domains in the context of the AEC industry to facilitate knowledge modeling and rule checking. In [41], a remarkable approach for facilitating regulatory compliance checking has been introduced based on N3Logic and EYE reasoning engine [5], and a test case of an acoustic performance checking is presented. In [34], an OWL ontology has been used for reasoning tasks in cost estimation cases. There are also cases regarding energy management and simulation, construction management, and job hazard analysis etc. [3,58,59]. All these examples have proved that different knowledge reasoning tasks in the AEC industry can be facilitated by properly using Semantic Web technologies.
Currently however, a systematic way to query data from building models using Semantic Web technologies is still missing. One of the possible reasons is that an authorized and stable standard ifcOWL ontology has only been established very recently and its adoption in suitable use cases will likely take a few more years. The most similar work that has overlaps with this research are the IfcWoD and SimpleBIM [36,39] ontologies. They both attempt to transform ifcOWL data to a more compact graph to ease query and improve runtime performance. The difference is that they mainly focus on developing a standard ontology as an alternative to ifcOWL to simplify the data graph, while this research is a framework that mainly considers the query functions with respect to semantics in common use cases and further extensions of them. A major enhancement of the approach introduced here is that functions related to geometry data are provided. To our knowledge, it is the first time to combine analyzing IFC geometry data with rule-based reasoning technologies.
Functional extensions of SPARQL
Extending SPARQL with additional functions has been proposed and implemented in other fields. The most inspiring ones are geospatial and geographical domains as they share many requirements, concepts and processes with the AEC industry. The stSPARQL in Strabon and the GeoSPARQL standard from Open Geospatial Consortium (OGC) have specified many topological and geospatial functions for 2D geometry data [32,43]. They have been implemented by spatial database systems including Strabon, Parliament and uSeekM [2,17]. Some other RDF APIs and triplestores like the Apache Jena framework, Allegrograph and OpenLink Virtuoso have also implemented some geospatial functions. To our knowledge, these vocabularies and functions developed in the Semantic Web world have mainly considered 2D geometry and cannot be directly reused for building models.
The AEC industry also has significant differences to e.g. the geospatial field. There are many disciplines and use cases in different contexts, in which the amounts of required properties and relationships are almost unlimited. There are much more sophisticated reasoning tasks related to 3D geometry. Therefore, the systems needed in the AEC domain must go beyond a fixed set of vocabularies but should rather provide a flexible framework that can reuse and extend functions more easily to process data and adapt with different situations.
From the implementation perspective, there are many technologies can be used to extend functions for SPARQL. Besides existing open source and commercial platforms (e.g. Apache Jena, OpenLink Virtuoso and Allegrograph) that support customizing functions by coding them with full fledged programming languages, there are some technologies that provide more transparent and portable methods for extending functions. For example, SPARQL Inferencing Notation (SPIN) can be used to define and execute functions by issuing SPARQL queries. A meta vocabulary is provided by SPIN to serialize SPARQL queries into RDF graphs to maintain implemented functions (see Section 5). The VOLT proxy provides a similar method that utilizes SPARQL fragments and graph patterns to define functions [44]. It has been applied on some geospatial cases and a plugin to include functions for spatial computation is provided based on the PostGIS API. Recently, an approach is presented in [12] to define functions by extending Triple Pattern Fragments (TPF) [54] on the client side, hence extended functions are compatible with any SPARQL server. As showed in [12], it however might have issues regarding performance and data traffics since additional functions are computed in web browsers and raw data needs to be retrieved to the client side. All these approaches can potentially be undertaken for implementing extended SPARQL functions for querying IFC building models and data in the AEC domain.

Conceptual relationships between vocabularies and IFC data.
Building data captured by the IFC data model is the focus for developing functions. The IFC documentation and requirement checking use cases from the Dutch Rgd BIM Norm, the Norwegian Statsbygg BIM Manual and some checks that have been implemented in the Solibri Model Checker (SMC) are reviewed to determine the structure of needed vocabularies [10,47,50,52]. Most of the referenced cases are BIM data quality validation requirements, which are associated with the IFC data model and are the most fundamental and commonly-used requirement checking cases. From reviewing the above sources, we have extracted many properties and relationships that are required in use cases (see Section 4.1, 4.2, 4.3 and 4.4). The implemented functions are wrappers of modular low-level code to derive such information and to coherently use them in different scenarios. Due to the complexity of the AEC industry however, it is not possible for a single organization to list all required functions for all common task scenarios. Instead, they are classified based on required data inputs from IFC building models since they are very much related to further implementations and extensions (see Section 5).
Vocabulary prefixes used in this paper and descriptions
Vocabulary prefixes used in this paper and descriptions
Information in IFC-based building models can be roughly grouped into (1) domain semantics that are usually explicitly represented by e.g. object types, relationships, and properties, and (2) geometric data, which is a low-level technical description captured by geometry objects associated with IfcProduct instances. Due to the lack of support for parametric geometry description on the levels of the data model and the implementation, these two kinds of information are almost independent from each other. In fact, building models in real practices often contain information that is inconsistent between these two subsets [49] (also see Fig. 2). We thus argue that query functions should be categorized to identify which subsets of the model are used to derive data from. As shown in Fig. 3 and listed in Table 1, the proposed domain vocabularies are classified into four groups to derive data from these two subsets of either geometric or non-geometric information in IFC models. Sections 4.1 and 4.2 describe functions used to extract information only from domain semantic subset of models, while Sections 4.3 and 4.4 describe functions to mainly analyse geometric aspects. Besides these four vocabularies that are defined for building objects, we also propose a vocabulary in Section 4.5 to materialize and process geometry data. It is considered as an additional lower level layer independent from domain information and can provide additional functions for some use cases e.g. the example in Listing 8. For each category and subcategory, some function examples are provided to show how to apply them on an ifcOWL instance data set and query examples are provided to demonstrate a use case.
There are generally two ways to extend SPARQL with domain specific functionality. The first method is to add operators in expressions (e.g. FILTER expression). The second one is to define a function as an RDF property, which is known as a computed property or property function to be used in triple patterns to generate or evaluate bindings based on its bound subject and object. The differences are: (1) a property function is also an RDF property that can have domain(s) and range(s); (2) a property function can generate new bindings for triple patterns beyond simply computing values based on inputs. The syntactic sugar of using RDF collections in triple patterns also provide the possibility for a property function to have multiple inputs and outputs (see an example in Listing 7). In the research presented in this paper, most of the extended functions are defined as property functions. We argue, that they are more flexible and intuitive and can potentially be materialized into RDF graphs for specific applications in order to improve runtime performance [38]. Functions are modelled as RDF vocabularies with their respective URIs. Due to the flexibility and openness of the RDF technology, additional vocabularies can always be added.
Functions in this group are defined to wrap commonly used structures specified on the IFC schema level. They are identified with prefixes schm: in this paper. We model these functions mainly from the fundamental concepts and assumptions specified in the official IFC documentation [10]. These fundamental concepts describe recommended and commonly used structures in IFC instances as the general guideline for usage and implementation of IFC. Each of the fundamental concepts defines how a domain concept or relationship should be represented in IFC. Many of them have relatively complex structures to represent semantics. By reviewing these fundamental concepts and comparing them with use cases, shortcuts can be constructed to simplify writing queries and adapt to the high level abstractions in the AEC domain. They are defined for the following situations.
The most basic functions are related to objectified relationships. Many relationships in IFC data are realized by objectified relationships that are instances of IfcRelationship subtypes. An example is IfcRelContainedInSpatialStructure, which is used in Listing 1. Most of these objectified relationships and their usage are described by the fundamental concepts in IFC documentation. In general, each of the objectified relationships can be used to associate an object with another object or a set of objects. For example, an IfcRelContainedInSpatialStructure can be used to associate an IfcSpatialElement (e.g. storey, space) with a set of IfcElement instances (e.g. wall, door) to define a spatial containment relationship. In the current vocabulary, functions are defined as shortcuts to wrap such structures and create direct relationships between the objects that are associated. For example, the function schm:isContainedIn is created to retrieve the relationship between an IfcElement and the containing IfcSpatialElement instance (see Fig. 4). With the same approach, functions are created for all the fundamental concepts which describe semantic structures containing IfcRelationship subtypes (see another example schm:hasSpaceBoundary in Table 2). This type of shortcuts are also proposed in [36] and [39].

Example of shortcut functions for schema level semantics.
Example functions for schema level semantics
Another requirement is that some relationships need additional specification or generalization. For example, the spatial composition relationship between spatial objects (e.g. site, building, space) is semantically different from the aggregation relationship between building elements (e.g. wall, slab, stair). The former one only represents a hierarchical spatial relationship, while the latter one implies geometry compositional relationship. In IFC, however, they are represented using the same structure (IfcRelAggregates). These two structures are defined as two different functions (see one of them schm:isDecomposedByElement in Table 2). On the contrary, sometimes more generalized relationships are required for different structures. A typical example is the relationship of material association. There are several means to associate a material with a building object (e.g. single material, layered material), while in many use cases, it requires direct relationship between an object and its associated material. In this case, besides functions for each different structures, an additional function is created to retrieve a direct relationship between an object and its associated material regardless of which representation it is taken (see schm:hasMaterial in Table 2).
The third situation is functions for additional shortcuts. They are defined only based on experiences and referenced use cases. A typical example is the relationship between a filling element (e.g. doors, windows) and a voided element (e.g. walls that have openings). If we need to assert such relationship, it is realized in IFC with two objectified relationships and an opening element as illustrated in Fig. 4. As such relationship is frequently required, a function is created as a direct relationship between the filling element and voided element (see Fig. 4 and Listing 2).
Following these approaches, over 40 relationships are currently wrapped as functions (see Appendix A). Some frequently used examples are listed in Table 2. Listing 2 shows an example query to apply two functions for a use case from Statsbygg BIM Manual [50], which requires to check whether every window and the wall it is placed in are contained in the same building storey. This query uses the functions schm:isPlacedIn and schm:isContainedIn. A comparison with a query using plain SPARQL to realize this use case is presented in Section 6.

Query to retrieve pairs of a window and a wall, with the condition that the window is placed in the wall but they are not contained in the same storey
Functions in this group are provided to represent IFC instance level semantics. As mentioned in Section 2, IFC instances can be semantically extended by property sets and quantity sets. These extended properties are modelled as instances of IfcProperty or IfcElementQuantity in IFC models, which are associated with IfcObject instances using certain structures. For example, Fig. 5 illustrates two common structures for associating IfcProperty with IfcObject [10]. An extended property that is modelled as an instance of IfcProperty with a related IfcPropertySet is associated with an IfcObject through either an IfcRelDefinesByProperties or an IfcTypeObject, which in turn is associated with the IfcObject through an IfcRelDefinesByType. The semantics of extended properties are identified by their names defined in external documentations. A property which is modelled using the former structure overrides a property modelled using the latter one if they have the same name.

Two common structures for associating IfcProperty with IfcObject.
This structure leads to complex declarations in SPARQL even for simple use cases. In this research, shortcut functions are defined to directly connect objects (IfcObject instances) with property values instead of using complex structures in IFC instances for writing queries. These functions are identified with prefixes pset: and qto: for property sets and quantity sets respectively. A typical example is illustrated in Fig. 6, where a wall that has a “LoadBearing” property is represented as an IfcWall associated with an IfcProperty instance in ifcOWL data. A shortcut property pset:loadBearing is defined to associate the wall and value of the property instance. All the properties of primary data types (instances of IfcPropertySingleValue and IfcPhysicalSimpleQuantity) can use the same mechanism to define functions. They are the majority in property sets and quantity sets and are also most frequently required in use cases. In our work, the property sets and quantity sets officially defined by buildingSMART are considered as examples. In total, there are 2519 properties and 257 quantities grouped within 415 property sets and 93 quantity sets in the official IFC 4 documentation [10]. Within them, 1471 properties and 257 quantities have the value range of primary data types and the domain of IfcObject subtypes. They are defined in our vocabulary.

Example of short cut functions for property sets. The schm:hasObjectProperty and schm:hasTypeProperty are two shortcut functions defined in the vocabulary schm: to wrap the two different structures (see Fig. 5) for associating an extended property with an object.
Functions are automatically extracted from the official ifcDoc document, which is a file in SPF format released by buildingSMART for storing IFC documentation. Additional, third-party property sets and quantity sets can be extended by processing e.g. simple XML or tabular structures with a trivial tool.
Listing 3 shows a query for a realistic quantity take off example, which is to count the load bearing walls on each building storey. By only using plain SPARQL, a query with the same semantics can also be written but with a much more complex structure (see the comparison in Section 6 and Listing 13).

Query to count load bearing walls for each building storey
Functions in this category are introduced to derive properties based on the geometric representations of a single building product. The vocabulary is identified by the prefix pdt:. In IFC model instances, geometry data is represented by geometry objects associated with related building products. Large amounts of properties are implied in geometric representations of building products including e.g. height, area, length. Although many of these properties can be represented by property sets and quantity sets (see Section 4.2), they are not mandatory and are not always reliable in real building models [49] . In fact, a typical example of BIM requirement checking is to check the consistency between property sets (or quantity sets) and properties derived from geometric representations [47,50].
General product geometry function examples that are applicable for all types of product
General product geometry function examples that are applicable for all types of product
The IFC data model offers a number of means to represent geometry for building products. The most common way is the Body representation, which defines 3D volumetric shape of products. However, there are many geometry types to describe a Body geometry in IFC including e.g. Boundary Representation (Brep), Constructive Solid Geometry (CSG) or Non Uniform Rational B-Splines (NURBS). In our work so far, they are unified as triangulated boundary representation to ease developing analysis algorithms, but can be tailored to different representation forms in future. The 3D geometry representation of a product is either represented by a single triangulated surface, a collection of triangulated surfaces or represented by triangulated surfaces associated with its composing elements.
Based on the triangulated representation, many general geometry properties are derived using existing or simple algorithms (see Section 5), including axis-aligned bounding box, oriented minimum volume bounding box, basic dimensions (e.g. height, volume, area of surfaces) and partial geometry (e.g. surfaces facing to certain directions). These properties are defined as general product geometry functions that are applicable for all products which have 3D representations. Table 3 lists examples of them, their 3D showcases and semantics. Listing 4 shows a use case: search for inconsistencies between the geometric height of a wall with its height quantity [47].

Query to retrieve walls that do not have height quantity or have inconsistent information between its height quantity and geometric representation
Example functions to derive geometry properties for specific product types
Combined with product types and some common assumptions (e.g. a wall length is greater than the wall thickness), many more specific product properties can be retrieved. These properties include some defined examples in Table 4. They can be applied for more domain related use cases such as design assessment. Listing 5 shows an example, which is defined to find out spaces which have too small window-to-floor area ratios. It is a common use case that can be additionally customized (e.g. add conditions for space types) to validate the design plan according to regulations or programmatic requirements.

Query to retrieve spaces which have window-to-floor area ratios less than 0.3
Functions in this group are provided to derive information related to spatial reasoning, which needs geometric and location data of multiple building products. This vocabulary is identified by the prefix spt:. They are additionally classified and described in following sections.
Relationships between products
Functions in this category are used to derive relationships between two products. We have defined some general topological relationships that belong to this group, which are applicable for all building products. They are related to many use cases including e.g. geometric clash detection and quantity take-off. The OGC Simple Features are also used as a reference for defining these functions, as they have already established general topological relationships for geometric objects [21]. The aim of these defined functions is not to cover a full range of possible scenarios, but to provide a set of reference examples for other developers and a basis for extensions. For example, directional relationships like “above”, “under” or more domain specific relationships can also be defined with the same form in the future.
Functions for relationships between products
Functions for relationships between products

Query to retreive all walls intersect with slabs. The result of query is used to detect clashes between walls and slabs
Defined functions are listed in Table 5 with their counterparts defined in OGC Simple Features and example scenarios for using such functions. Each of these functions retrieve products that have such relationships, or evaluate the relationship between two products. In the GeoSPARQL standard, these topological relationships are defined to process 2D geometry data, while in our cases 3D geometry data is the focus.
Listing 6 shows an example to retrieve walls which intersect with slabs in order to detect clashes between walls and slabs.
Functions in this group are used to derive properties for groups of products. Querying the distance between products is a typical example. Many building codes and BIM requirement manuals constrain the minimal, maximal or exact distance between building components, such as interference between building elements, clearance before openings, heights of floors etc. The exact semantics of the notion “distance” can vary between contexts. We have currently defined the concepts provided in Table 6.
Functions as properties for groups of products
Functions as properties for groups of products
An example query is provided in Listing 7 to detect suspended ceilings that are too close to the floor slab and may e.g. interfere mechanical, electrical, and plumbing components (MEP) by selecting ceilings which have the vertical distance shorter than 0.4 meter with floor slab in the above floor [50]. The function spt:distanceZ requires two products as the inputs for the computation.

Query to retrieve ceilings that are too close to the floor slabs in the above floor
In the considered use cases, there are also examples that not only require geometry data of referenced products, but also require to process geometry data of other specific types of related building products. For examples, spatially identifying whether a building storey is located right above another one requires geometry and location data of floor slabs of all the building stories, and retrieving a walking path between two spaces requires geometry data of all the related spaces, obstructions and openings. The exact semantics of these properties often require knowledge from AEC sub-domains for their specification. We currently only provide two example functions listed in Table 7 for this group. Besides referenced products (building storey and building elements), they both require to process geometry data of floor slabs of all building storeys.
An example query which uses the function spt:has-UpperStorey is shown in Listing 7.
Implemented example functions as properties based on spatial relationships
Implemented example functions as properties based on spatial relationships
This vocabulary includes geometry related concepts that are materialized in RDF graphs. They are considered as general geometry concepts that provide additional layers independent from domain information. Similar with GeoSPARQL, we define the geom:Geometry as the class for geometry objects. As mentioned in Section 4.3, triangulated representations are used to represent Body geometry data. As geometry data for a product is usually processed as a whole, Well Known Text (WKT) string literals that have been defined in Simple Feature Access [21] are adopted to keep materialized triples in small size. The geometry data of an element (instances of IfcElement subtypes) that is decomposed by other elements is represented by geometry data of its composing elements. Figure 7 illustrates the basic structure for materializing product geometry data. Table 10 lists a comparison between triple count of building models in ifcOWL, geometry subsets of them and the triple count of geometry data represented in this format. It shows that geometry data represented in triples with WKT literals is much more compact and should be more efficiently processed by programs. Besides the triangulated representations that are by default always materialized, the axis aligned bounding boxes and minimum volume bounding boxes for products are also provided in this vocabulary. In future research, other types of geometry representations can also be extended if they are required.

SPARQL query with domain specific functional extensions.

Use case examples that require temporarily added or generated geometry objects to analyse properties and relationships of building objects: The first one requires “upper surface” and “lower surface” of walls and slabs to evaluate their topological relationships; The second one requires extruded boxes to evaluate clearance in front of windows.

Query to select all walls which do not have bottom surface touching the upper surface of any floor slab on the same floor
Another requirement that can be addressed by WKT and this vocabulary is to represent and process temporarily generated geometry data at query runtime. In many tasks, analysis on IFC building models not only requires geometry data of building products, but also needs temporarily defined or derived geometry data. Figure 8 shows some use cases of them. Such geometry objects can be manually added or automatically derived in query runtime with the WKT literals, and expression functions used in e.g. FILTER expressions can be defined for additional manipulation on them. An initial set of expression functions for manipulating WKT data are defined. The query example in Listing 8 demonstrates an example of using them. In this example, the bottom surface and upper surface are derived at query runtime (see Table 3) as partial geometries of a wall and a slab, and they are additionally evaluated by the function geom:touches3D to identify their topological relationships.
In our prototype implementation of the proposed functions, we attempt to minimize hardcoding to make defined functions more portable, more transparent for public reviews and easier to be extended by the research and development community. Table 8 lists the current amount of defined and implemented functions.
Count of currently defined and implemented functions
Count of currently defined and implemented functions

Query that is used in SPIN to map the function schm:isContainedIn

SPIN listing (TURTLE syntax) for the query in Listing 9, which is used in SPIN to register and define the function schm:isContainedIn
Functions defined in Section 4.1 and 4.2 can be implemented by a range of methods including those described in Section 3.3 and declarative rule languages like e.g. Semantic Web Rule Language (SWRL) and N3Logic [5,22]. We choose SPIN for the implementation, as it uses SPARQL and already has a few open source implementations which enhance future compatibility. SPIN provides a set of vocabularies to wrap SPARQL queries as functions and allows their cascading use. For example, the function schm:isContainedIn in Listing 2 is mapped to ifcOWL with the query in Listing 9. As presented in Listing 10, this function is maintained as an instance of spin:MagicProperty, and the query is transformed to RDF and associated with the function using spin:body property. The system will trigger the query as a subquery when the function schm:isContained is called as the predicate in a triple pattern. In this process, the subject and object of this triple pattern will be passed to ?arg1 and the output (in this case the ?a2) of the query respectively to generate or evaluate bindings. An advantage of using such method for implementing functions is that development work is more portable. For example, the RDF graph in Listing 10 can be loaded in any SPIN-enabled environments in order to use this function in SPARQL queries.
When dealing with geometry related reasoning tasks, declarative methods like SPIN are usually not sufficiently expressive to implement sophisticated and computational intensive algorithms. Geometry data in IFC or ifcOWL is preprocessed and transformed to RDF data represented by the vocabulary described in Section 4.5. Functions described in Section 4.3, 4.4 and 4.5 are implemented using procedural programming. Many existing general purpose geometry algorithms and domain specific algorithms can be reused. For example, functions in Section 4.4.1 are implemented by computing on triangles of both products to determine their relations, similar with algorithms described in [11]. Table 9 lists the key procedurals and algorithms that are used. They are coded in Java in the current prototype.
Procedurals for implementing geometry-related functions and used existing algorithms

Implementation architecture (blank blocks are added modules).

Data flow of querying and reasoning process.
The functional extensions introduced here are implemented based on the Open Source Apache Jena framework and SPIN API (see Fig. 9). In this implementation, all the extended functions are processed at query runtime in a backward chaining order. The data flow is illustrated in Fig. 10. The ifcOWL instances or IFC files and SPARQL queries are the input of the system. The ifcOWL data or IFC files are preprocessed to generate additional triples that capture geometry data using the vocabulary described in Section 4.5 and WKT literals. Depending on the size of ifcOWL files, we can choose to load them into memory or materialize them into a graph persisted into a Jena TDB triplestore. When a property function is referred to during a query execution, a SPIN rule as a subquery or a snippet of programming code to retrieve related values is triggered. Since a SPIN rule is also a SPARQL query that can call extended functions, this process iteratively continues until no functions are left to be called. This process can be compatible with other reasoning technologies. For example, in this prototype an Jena RDF Schema (RDFS) reasoner is used underneath of the SPARQL query engine. A prototype Web-based user interface with a 3D visualization environment is implemented to input queries and visualize query results (Fig. 11). For example, it highlights retrieved building products in order to report e.g. building products that under certain conditions or violate constraints.

The Web-based query interface with 3D graphical visualization of this prototype implementation.
To evaluate the effectiveness of defined functions and the prototype implementation, test work is conducted using three IFC building models employing example queries presented in Section 4, each of which represents a realistic use case taken from BIM manuals or common requirement checking applications (see Table 11). The query processes are compared with those realized by standard SPARQL. We also compare our approach with existing proposals for simplifying ifcOWL data and writing queries. Through this work, we aim to (1) evaluate the effectiveness of using these functions to simplify queries and retrieve useful information implied in 3D geometry data, (2) demonstrate the added value as well as the differences of this approach, and (3) initially evaluate applicability by providing indicative measurements of query performance.
Statistics of tested building models M1, M2 and M3
Statistics of tested building models M1, M2 and M3
Query tested in the evaluation study
The models selected for the test are open IFC models commonly used as a reference in literature [14]. They are converted to ifcOWL RDF data and loaded into named graphs persisted in a Jena TDB triplestore. Additional WKT geometry triples that capture triangulated boundary representations of building products are generated with the IfcOpenShell package [29]. The size of the different models as well as their specifications are listed in Table 10. For example, the model M1 is 2.25 MB in size in its SPF representation, and the ifcOWL version contains 298,085 triples. 546 additional triples have been generated to capture triangulated boundary representations with the geom: vocabulary using WKT literals. All datasets are available at
Example queries presented in Listing 2, 3, 4, 5, 6, 7 and 8 in Section 4 are used in this evaluation and the results of their execution are presented in this section. Each of the queries addresses a realistic use case that has been specified e.g. in BIM manuals or implemented as standard model checks in proprietary model checking software tools. They are summarized in Table 11, which lists use case types and requirements. Q1 and Q2 are only related to non-geometric data, while Q3 to Q7 are geometry related.
The hardware used for the evaluation is a mid-range laptop with a Quadcore i7 2670 processor and 4 GB memory allocated for the Java Virtual Machine (JVM). Each of the queries is executed 10 times to derive the average query time.
Comparison with a procedural using plain SPARQL
Table 12 documents the results and average query execution times for Q1 to Q7 on models M1, M2 and M3. All queries except Q2 are used to address requirement checking use cases by retrieving building objects which violate defined constraints. Thus, in these cases returning zero results means no violation in the building model was detected.
Q1 and Q2 only depend on functions that are implemented based on the SPIN framework, which in turn is dependent on the Jena ARQ query engine, while Q3 to Q8 also depend on additional computations in external Java code. Q3 and Q4 are related to functions in the group introduced in Section 4.3 and their query execution time mainly depends on the algorithms used for deriving properties from the geometry data of a single product. For example, in Q3 when the function pdt:hasOverAllHeight is called, the underlying WKT representation of the product is processed to generate an axis-aligned bounding box on the fly in order to derive the overall height of a wall. In Q4, the most computationally expensive part is the function pdt:hasWindowArea, which needs to compute a minimum volume bounding box for each window object. These processes can be optimized by materializing additional geometry representations for building products. Q5, Q6 and Q7 have relatively longer execution times, especially for the largest model M3. This is expected since these three queries are all related to spatial reasoning functions, which involve geometry data of multiple building products. For example, the current procedural of the function spt:intersects, which is used in Q5, needs to compute the topological relationship for each combination of a wall and a slab. This procedure needs to run 750*19 times for the model M3, which contains 750 walls and 19 slabs. This can be optimized further by mechanisms like adding spatial indices to reduce computation time.
Comparison
We first compare the results with a procedural that only uses SPARQL to query ifcOWL data. The same query environment is set up with the exception that all the extended functions are not activated. By just using SPARQL and ifcOWL data, only the use cases that are addressed by Q1 and Q2 can be realized, hence the comparison is limited in these two queries. Queries with the same semantics of Q1 and Q2 are written in SPARQL and presented in Listing 12 and Listing 13 in Appendix B. They have complex query bodies that contain more triple patterns. They are documented as Q1* and Q2* in Table 13, which also compares them with Q1 and Q2 with respect to triple pattern count in WHERE clauses, query results and average query time. It shows that with significantly simplified query bodies, Q1 and Q2 have the same query results with Q1* and Q2* respectively without sacrificing much performance. This topic is further discussed in Section 8.3.
As mentioned in Section 3, there have been a few ontologies developed to simplify ifcOWL data including those introduced in [36] and [39]. All these existing efforts have not considered processing geometry data, hence only the use cases addressed by Q1 and Q2 can be addressed. Functions defined in Section 4.1 and 4.2 can be compared with those simplified ontologies. The difference is that those existing simplified ontologies tend to preprocess ifcOWL data (or IFC data) and transform it to a more compact data graph and improve query performance, while the approach presented in this paper treats simplified properties and relationships as functions, which are used in query runtime. An advantage of this approach is that simplified queries can run on any ifcOWL data without additional materializations (for functions defined in Section 4.1 and 4.2). This provides a more flexible paradigm that users do not have to adopt the entire vocabulary but can reuse a subset of them or extend them to adapt with more specific use cases. If some simplified IFC ontologies are standardized, this approach can also be compatible with them by defining additional mapping rules.
Regarding use cases addressed by Q3 to Q7, to our knowledge there is no open and off-the-shelf query system in the Semantic Web field can be compared with. Some of them might be supported by BIM query languages which support geometry features like those introduced in [11] and [13]. However, we argue that these query languages either have limited expressive power or do not have precisely defined or standardized semantics, while this approach is based on a standard and expressive query language [1,19]. More importantly, with this approach RDF and other Semantic Web technologies can be leveraged to facilitate knowledge reasoning and data integration and partition tasks. With these capabilities, defined functions can more easily be reused and extended for specific applications. An example is presented in Section 7.
An extended application example
As mentioned in Section 4, to define an exhaustive list of functions for the entire AEC industry may hardly be achieved, hence the system should allow functions to be extended more easily. An application example is presented in this section in a regulatory compliance checking scenario that requires to extend case specific functions to query both building models and regulatory data. The aim of this example is to demonstrate how functions could be extended to address specific cases with less arbitrary programming work, which is commonly used in BIM applications and query techniques. To address the complexity of knowledge engineering work required for extending functions is not within the scope of this paper.
The example provided here is taken from the International Building Code (IBC), which is developed by the International Code Council (ICC) and used as a base code standard in United States [23]. This rule example is from Chapter 7 Fire and Smoke Protection Features, and is used to check opening areas on external walls to evaluate their fire performance. This example requires to process domain specific semantic data and geometry data in building models and external tabular data defined in the IBC document.
705.8.4 Where both unprotected and protected openings are located in the exterior wall in any story of a building, the total area of openings shall be determined in accordance with the following:
Additionally, the allowable opening areas for protected and unprotected openings (ap and au) are determined by the Table 705-8 in IBC that describes their relations with fire separation distance. This table has three columns and twenty-four rows. Table 14 shows one row of it, which defines that when the fire separation distance is between 15 to 20 feet and the opening is unprotected and the space is non-sprinklered, the allowed ratio (au in the equation) between opening area and external wall area is up to 25 percent.
One row of Table 705-8 in International Building Code [23]
One row of Table 705-8 in International Building Code [23]
In this example, external wall instances in a dataset have to be checked and analysed to derive related properties and relationships. In addition to the data captured in the IFC building data sets, the referenced table in this example can be considered as a small dataset that needs to be processed to derive allowable protected openings and unprotected openings for each wall. It is transformed to the RDF format with the approach described in [51] and processed along with the building model. A general algorithm in a procedural pseudo-code notation is specified in Algorithm 1 to check building models and find out external walls which violate this requirement.

Procedure for checking rule 705.8.4
Extended functions for the rule case 705.8.4 in IBC
Case specific functions are extended for deriving some of these properties based on provided functions. For example, the value

Query to retrieve external walls which violate the constraint defined in this building code
As a proof of concept, a building model is created, which contains required building elements and lot line (modelled as an IfcAnnotation instance) with related properties. It is a small model that contains 189,778 triples. With all the additional SPIN functions loaded, it is checked using the query in Listing 11. This prototype implementation generates a visualization of the result that is provided in Fig. 12.

Snapshot of the query result of the GUI.
Flexibility and portability
In comparison with domain specific query languages that are developed from scratch, the approach introduced in this paper leverages Semantic Web technologies and existing implementations to provide a more interoperable, modular and flexible mechanism to extend functionality in order to address a wide range of use cases for information extraction and validation of the AEC industry. As shown in Section 7, query functions for specific use cases can be extended by adding additional declarative rules based on procedural functionality. They are modular and flexible to adapt to the various possible forms to present facts in IFC datasets. For example, a protected opening in another building case, created by another author using a different BIM authoring tool might be different from how it is defined in Listing 14. It is easier to change or replace this rule without affecting other rules. External, linked datasets can be addressed using the same technology as long as they are captured as RDF or provide SPARQL endpoint services [6,20].
Declarative methods can also enable more portable implementations for functions. As many functions are defined using SPIN rules, they can be reused by query environments which have implemented SPIN (e.g. Topbraid SPIN API or Eclipse RDF4J) and potentially be reused by those which have implemented SPARQL. All the SPIN functions are stored in RDF which can be maintained in triplestores or shared as dereferencable resources on the Web. It is also possible for users to upload SPIN rules as RDF data to the server side to extend functions for their own cases without extending the source code of the server.
There are a few issues that affect the portability of this system. The main issue that limits portability here is functions implemented by procedural programming, which still needs geometry libraries to be integrated. This may not be addressed in a short term as geometric computation is a domain that usually requires specific methods and tools. Secondly, the portability also depends on the implementation of used declarative methods. With the implementation approach presented in Section 5, in order to implement a BimSPARQL-enabled endpoint and reuse some of the development work here, the server side must support SPIN by e.g. integrating SPIN engines. For specific applications that require extending additional functions using SPIN, users must have the access to upload SPIN rules to the server side.
Coverage
The full list of implemented functions are published in the link of Appendix A. They are defined based on referenced BIM requirement checking use cases. There are various use cases in the AEC industry and almost unlimited properties and relationships are required. IFC also provides rich methods to represent information to adapt with different contexts and projects. As stated in Section 4, it is not our aim to provide a complete set of functions, but to suggest a more bottom-up approach to define modular functions and then gradually extend to cover more use cases. In this approach, each function should be considered as a module to retrieve a view from IFC building models. A general classification of them is provided as a framework for further extending functions and a set of functions are provided as foundational examples.
Functions introduced in Section 4.1 and 4.2 cover all the commonly used semantic structures and all the simple data properties and quantities defined in the official IFC documentation. In real practices, these two groups of functions can be extended according to various application concepts in AEC sub-domains and third-party property sets and quantity sets.
Functions introduced in Section 4.3 and 4.4 mainly focus on triangulated boundary representation, which is a fundamental geometric representation that can be used to represent any 3D physical shapes. The WKT literals simplify the structure of IFC geometry data, which has a high degree of decompositions. This method enables many general geometry algorithms be reused for analyzing data (see Table 9). A set of general geometry and spatial reasoning functions that are applicable for all building products and some example functions related to specific product types are defined. This geometric representation is also related to further implementations including e.g. spatial indexation and use cases like efficient visualization, which is commonly required for many applications in the AEC industry. It is suggested that such representation should be accepted as the basis for other implementations to ensure interoperability and query results across them. There are indeed use cases that require particular geometry forms (e.g. deriving the flange thickness for a I-shape beam requires parametric I-shape profile objects), they can be extended by providing multiple representations for specific products. It can be envisioned that with efforts of research communities, a consensus of a set of geometry representations for query and analysis should be defined and accepted.
From the perspective of use cases, a current limitation of this approach is related to requirements of instantiating resources with additional triples based on procedural computations. For example, identifying the shortest path between two rooms usually needs to instantiate a path object, which might have geometric representations and relationships with e.g. passed spaces and doors. Even with procedural coding, extended functions are not suitable to create such additional dynamic data graphs in query runtime. WKT literals might be used to represent additional geometry objects in query time, but more investigations are still required to properly adapt this type of query functions in an RDF and Semantic Web environment.
Query performance
At present optimizing query performance is not in the main focus of the research presented here. Query performance depends on the implementation of used technologies and geometry analysis algorithms that are used. The prototype implementation has used the SPIN framework, which is based on the Jena ARQ query engine and Jena TDB triplestore. In Section 6, it is shown that for some cases, simplified queries can have similar performance with equivalent plain SPARQL queries. This implementation method has also taken part in a performance benchmark with comparisons to other rule languages and their implementations [38]. That research shows that this implementation method is a reliable approach, but there is still room for optimizing its performance in comparison with some commercial databases like Stardog. In the current SPIN framework, when a function is called in a triple pattern, it is considered as a separate query that is executed based on assigned arguments and then joints with the temporary results of outer query. It lacks a query rewriting mechanism to flatten queries and preferentially execute the most selective triple patterns regarding all the triple patterns defined in called functions.
As shown in Section 6, the current performance short-coming can be mainly attributed to geometry-related functions, especially spatial reasoning related functions. With a plain RDF triplestore like Jena TDB without additional optimization mechanisms, spatial reasoning functions have relatively long running time. In future developments, this process can be optimized by e.g. integrating spatial indices and caching mechanisms.
As RDF graphs are flexible, another direction for optimizing performance might be to materialize required triples into RDF graphs for specific applications. As described in Section 6, besides ifcOWL data only the triangulated boundary representation of products are currently materialized and all the functions are processed at query runtime. If some related functions are frequently required for specific applications, it is recommended to materialize them as properties. The effect of materialization has been discussed in [38]. Since it is usually a trade-off between storing and computing data, a dynamic approach to automatically materialize triples with regards of use cases, preprecessing, runtime performance and storage cost needs additional investigation and future research.
Conclusion and future work
This research provides a general framework to define and extend SPARQL functions for querying IFC-based building data. A set of functions are classified and introduced and two different approaches are used to implement them. It is shown in Section 6 that many BIM requirement checking use cases can be addressed by using SPARQL with these functions, which either simplify queries or enable implicit information be retrieved from 3D geometry data. The work presented here should be regarded as a general framework and proof of concept for a modular, scaleable approach to address the large amounts of domain specific query requirements in the AEC domain. As more and more data is represented by RDF and Linked Data technologies, this approach has considerable advantages over the current practices to process building related data in proprietary information silos using one-of-a-kind island solutions.
The links to the vocabularies, transformation rules and source code repository of the prototypical reference implementation are provided in Appendix A.
In the future, more use cases should be investigated and implemented to gradually extend the functionality for specific sub-domains in AEC industry and to combine data from different sources. The extension work should not be conducted with a totally ad hoc manner, but be more systematic regarding classifications of functions and geometry representations. Besides, there are a few directions that can be considered as downstream work for future research and development.
Performance optimization
Optimization for query performance is necessary as it is important for applications of this approach. As discussed in Section 6 and 8, some current performance shortcomings of the solutions introduced here are mainly related to the implementation issues. They seem specifically problematic for spatial reasoning functions, which require additional computations related to many-to-many relationships. In the future, spatial indexation mechanisms can be integrated to improve performance in this aspect, as they have been proved to have significant impact for spatial reasoning in the geospatial domain [37]. This might lead to creating specialized databases in the future. Additional development and testing work is required since this method might also cause the performance issue of preprocessing building models if building designs change frequently. Other general query optimization techniques such as query rewriting and additional materialization should also be investigated and applied.
Implementation approaches
As discussed in Section 8, the purpose of introducing a declarative language for implementing some of the functions is to improve the portability of development work. In practices, the portability is also affected by the implementation status of used declarative language. Other technologies especially standardized ones shall also be investigated in the future. It however requires evaluation and comparison regarding their expressiveness, implementation status and performance. A potential candidate is Shape Constraint Language (SHACL) [31], which is a newly standardized W3C Recommendation and has many functionalities in common with SPIN. Regarding geometry related functions, existing and commonly used 3D geometry libraries like e.g. CGAL [16] may also be integrated in the future to reuse algorithms, improve interoperability and performance of computations.
Knowledge engineering
The last direction that can be considered as a long-term objective is to simplify the knowledge engineering processes that are required for extending functions. As shown in Section 7 and Appendix C, SPIN rules can enable more flexible extensions for functions, but the knowledge engineering work required is still intensive for domain end users. How to enable them to effectively translate domain knowlege into processable rules is still an open question that needs to be addressed. Proper methods and tools for such knowledge engineering activities need to be developed to ease these processes and verify correctness.
Footnotes
Resources
The vocabularies, rules and models are published with doi: 10.17605/OSF.IO/V5ENM. Related source code for the backend is published on: https://github.com/BenzclyZhang/BimSPARQL.
