Abstract
Established in 2012 by members of the Food and Agriculture Organisation of the United Nations, the Global Soil Partnership (GSP) is a global network of stakeholders promoting sound land and soil management practices towards a sustainable world food system. However, soil survey largely remains a local or regional activity, bound to heterogeneous methods and conventions. Recognising the relevance of global and trans-national policies towards sustainable land management practices, the GSP elected data harmonisation and exchange as one of its key lines of action. Building upon international standards and previous work towards a global soil data ontology, an improved domain model was eventually developed within the GSP, the basis for a Global Soil Information System (GloSIS). This work also identified the Semantic Web as a possible avenue to operationalise the domain model. This article presents the GloSIS web ontology, an implementation of the GloSIS domain model with the Web Ontology Language (OWL). Thoroughly employing a host of Semantic Web standards (Sensor, Observation, Sample, and Actuator ontology (SOSA), Simple Knowledge Organisation System (SKOS), GeoSPARQL, QUDT), GloSIS lays out not only a soil data ontology but also an extensive set of ready-to-use code-lists for soil description and physico-chemical analysis. Various examples are provided on the provision and use of GloSIS-compliant linked data, showcasing the contribution of this ontology to the discovery, exploration, integration and access of soil data.
Introduction and Motivation
The Importance of Soils and Related Risks
Human population has more than tripled since the end of World War II (United Nations, 2019). This growth has been accompanied by the densification of urban areas, with the share of population living in cities doubling, having surpassed 50% in 2010 (Desa, 2018). Supporting this population has required unprecedented growth in food production. Nevertheless, dramatic increases in food output per unit area have meant an expansion of global agricultural area by just 30% in the past seven decades (Ramankutty et al., 2018). Albeit a success, this transformation and expansion of food production systems have placed unprecedented stress on soils. These are non-renewable natural resources that, if mismanaged, can rapidly degrade to a non-productive state. Soils around the globe are presently impacted by the over-use of fertilisers, chemical contamination, loss of organic matter, salanisation, acidification, and outright erosion (Kopittke et al., 2019). These trends pose serious risks not only to food supply, but also to ecosystems, as they provide a myriad of services at the local, landscape, and global levels (Banwart et al., 2014; FAO and ITPS, 2015; UNEP, 2012).
Addressing these risks often requires a holistic approach, with policies and practices envisioned at a global scale. For instance, the reduction of soil erosion through land rehabilitation and development (Borrelli et al., 2017; WOCAT, 2007), the protection of food production (FAO et al., 2018; Soussana et al., 2017; Springmann et al., 2018), or the preservation of biodiversity (Barnes, 2015; IPBES, 2019; van der Esch et al., 2017) and human livelihood (Bouma, 2015). However, the data necessary to develop such policies is collected, analysed, and represented at local scales, as these remain primarily region- or country-specific activities. The data harmonisation necessary towards the sustainable use of soils at the global scale thus remains a challenge (Global Soil Partnership, 2017a).
Global Soil Partnership (GSP) and its Goals
The GSP was established in 2012 by members of the Food and Agriculture Organisation of the United Nations (FAO) as a network of stakeholders in the soil domain. Its broad goals are to raise awareness of the importance of soils in attaining a sustainable agriculture and to promote good practices in land and soil management. The GSP involved the majority of the world’s national soil information institutions, gathered around the International Network of Soil Information Institutions.
The GSP defined five pillars of action, structuring its activities:
Pillar 1 – Pillar 2 – Pillar 3 – Pillar 4 – Pillar 5 –
The Action Plan for Pillar 5 (Global Soil Partnership, 2017a) acknowledges various difficulties with the harmonisation of soil data. In most cases, these data are collected and curated by national or regional institutions, focused on their local context, largely abstracted from international or global concerns. This lack of homogeneity severely limits the availability and use of soil data. The transfer of data, methods, and practices between regions or from global to local initiatives is thus prone to hurdles and errors, putting at risk sustainable soil management goals.
Among the key priorities towards harmonisation identified in the Action Plan for Pillar 5 is the development of a soil information exchange infrastructure. This is broadly defined as “[…] a conceptual soil feature information model provid[ing] the framework for harmonisation such that the efficient exchange and collation of globally consistent data and information can occur.” Data exchange is put forth both as an essential component of soil data harmonisation and as a vector to that end, facilitating data integration, analysis, and interpretation.
In the Action Plan for Pillar 4 (Global Soil Partnership, 2017b), the GSP lays out the guidelines for the development of an authoritative global soil information. This system is envisioned as fulfilling three main functions:
answer critical questions at the global scale; provide the global context for more local decisions; supply fundamental soil data to understand Earth-system processes to enable management of the major natural resource issues facing the world.
Draft implementation guidelines are laid out in the Action Plan for Pillar 4, pointing to a federated system in which soil institutions provide access to their data through web services, all compliant with a common data exchange specification. The latter is leveraged on the outcome of Pillar 5, concerning the exchange of soil profile observations and descriptions, laboratory and field analytical data, plus derived products such as digital soil maps. Soil data exchange is thus set at the core of GSP, an unavoidable stepping stone to achieve its goals. As set out in the Action Plan for Pillar 5: “Pillar 5 is a basic foundation of Pillar 4, and an enabling mechanism for all GSP pillars providing and using global soil information.”
In 2019, the GSP launched a call for an international consultancy to assess the state-of-the-art in soil information exchanges and propose a path towards its operationalisation in line with the goals of Pillar 5. The results of this consultancy are gathered in Rezník and Schleidt (2020). In this work, a detailed set of requirements was inventoried, sourced from meetings and interviews with various GSP stakeholders. Among them is the will to re-use existing models and exchange mechanisms as much as possible, and assess the suitability of each regarding implementation (with Pillar 4 in view).
The consultancy identified relevant similarities between previous models targeting soil data exchange: Australian and New Zealand Soil Mark-up Language (ANZSoilML; Simons et al., 2013),
The ISO 28258 model was selected as the most suitable starting point to operationalise the sought for exchange mechanism. The model was augmented with container classes encapsulating the Guidelines for Soil Description issued by the FAO (Jahn et al., 2006), an abstraction of the code-lists necessary for the exchange. The resulting model is documented as a UML class diagram. Regarding implementation, the consultancy concluded on the suitability of both XML and RDF. XML was early on put forth as an implementation vehicle for O&M (Cox, 2011a), whereas the more recent publication of the Sensor, Observation, Sample, and Actuator ontology (SOSA; Janowicz et al., 2019), an RDF-based counterpart to O&M, presents a clear path to an implementation on the Semantic Web.
Document Structure
This article starts by briefly reviewing previous models that tackled soil information exchange (Section 2). Section 3 presents the methodology, followed by the specification of the Global Soil Information System (GloSIS) web ontology, up to the maintenance aspects. Section 4 presents some example applications of the ontology, including methods for the discovery and access of soil data based on GloSIS. The article closes with considerations on future work in Section 5. All RDF assets composing the GloSIS web ontology, as well as its documentation, are available at a public software repository. 1 Table 1 summarises the prefixes and corresponding namespaces used in the ontology and throughout this article.
Namespaces.
Namespaces.
The GloSIS domain model and web ontology follow on the steps of various earlier attempts at a framework for the exchange of soil data and knowledge. This section reviews the most relevant.
SOTER
The Global and National Soils and Terrain Digital Databases (SOTER) was an initiative of the International Society of Soil Science, in cooperation with the United Nations Environment Programme, the International Soil Reference and Information Centre (ISRIC), and the FAO (FAO of the United Nations Land and Water Division , 1993). It was the first attempt to create a digital soil resource of global reach, making use of what were then emerging technologies, such as Relational Database Management Systems and Geographic Information Systems (GIS). Whereas primarily targeting the production of digital maps for decision support, the SOTER initiative possibly embodied the first global digital vocabulary of soil properties and characteristics, assessed in situ, as well as via laboratory measurements. Albeit lacking an abstract formalisation (SOTER pre-dates both UML and Web Ontology Language (OWL)), the ancient SOTER databases remained a reference to the development of subsequent soil information models.
ISO 28258
The international standard “Soil quality – Digital exchange of soil-related data” (ISO number 28253) resulted from a joint effort by the ISO technical committee “Soil quality” and the technical committee “Soil characterisation” of the European Committee for Standardisation (CEN). Recognising a need to combine soil with other kinds of data, this standard set out to produce a general framework for the exchange of soil data, recognising the need to combine soil with other kinds of data.
ISO 28258 is documented with a UML domain model, applying the O&M framework to the soil domain. It abstracts familiar concepts in soil science such as
Australian and New Zealand Soil Mark-up Language (ANZSoilML)
The ANZSoilML (Simons et al., 2013) results from a joint effort by CSIRO in Australia and New Zealand’s Manaaki Whenua to support the exchange of soil and landscape data. Its domain model was possibly the first application of O&M to this domain, targeting the soil properties and related landscape features specified by the institutional soil survey handbooks used in Australia and New Zeeland (Milne et al., 1995; National Committee on Soil and Terrain, 2009). This model outlines a hierarchy of observable features, including the concepts
ANZSoilML is formalised as a UML domain model from which an XML schema is obtained, relying on the ComplexFeature abstraction that underlies the SOAP/XML web services specified by the OGC. A set of controlled vocabularies was developed for ANZSoilML, providing values for categorical soil properties and laboratory analysis methods. However, these were never made mandatory, and the model is open to be used with alternative vocabularies. More recently, these vocabularies were transformed into RDF resources to be managed with modern Semantic Web technologies.
The Soil Theme in INSPIRE
The INSPIRE directive of the European Union came into force in 2007 with the goal of creating a spatial environmental data infrastructure for the Union. A detailed data specification for the soil theme was published by the European Commission in 2013 (Soil ITWG, 2013), supported by a detailed domain model documented as a UML class diagram. The model provides more depth for soil inventory data, relying heavily on O&M in the specification of soil properties observations (both numerical and descriptive). The features of interest identified in this model match familiar concepts in soil surveying:

Visual Representation of the Main Feature of Interest in the INSPIRE Domain Model (Soil ITWG, 2013). Image Re-Used According to Decision 2011/833/EU of the European Commission.
While the domain model is documented as UML, there is no enforcing policy from the European Commission regarding its implementation. Guidelines have been published by the INSPIRE Maintenance and Implementation Group on possible implementation technologies, such as GeoPackage. 2 An infrastructure has been put in place to register the code-lists of all INSPIRE themes, currently maintained by the Joint Research Centre. 3 In the Soil Theme, code-lists are mostly composed of broad concepts that must be further redefined by member states. The European Commission has set up a dedicated platform named INSPIRE Geoportal 4 that works as a single access point to INSPIRE-compliant data services provided by the EU member states.
The Working Group on Soil Information Standards (WGSIS) of the International Union of Soil Sciences (IUSS) acknowledged the parallel efforts of Oceania (ANZSoilML), Europe (INSPIRE), and ISO towards the implementation of a soil information exchange mechanism. However, from the perspective of the WGSIS, these concurrent initiatives were leading to a dispersed landscape in need of consolidation. Under the auspices of the OGC, the WGSIS set out the Soil Interoperability Experiment (SoilIE), aiming to reconcile the existing soil information domain models into a single exchange paradigm. As with previous efforts, SoilIE relied heavily on O&M to express the aspect of soil sampling and analysis, but going into considerably more detail. In a complex structure of sub-models, the SoilIE domain model specifies a large number of features, some similar to other models (e.g.
Contrary to the “empty shell” approach of ISO 28258, SoilIE went on to define in detail the soil properties subject to exchange. To this end, the experiment relied primarily on the FAO Guidelines for Soil Description (Jahn et al., 2006), with additional guidance from the USDA Field Book for Describing and Sampling Soils (Schoeneberger et al., 2012). The experimental implementation took a hybrid approach. The domain model was encoded as an XML schema (known as SoilIEML) following the principles laid out in ISO 19136 (ISO 19136:2007, 2007), which depend on GML for geo-spatial features. This XML schema was the basis for a series of OGC-compliant web services (Web Feature Service in particular). The Simple Knowledge Organisation System (SKOS) was selected as the preferred vehicle for controlled content (e.g. code-lists). The integration of the Semantic Web-based SKOS with the XML schema proved problematic, with
Ontology Specification and Implementation
Methodology
The GloSIS web ontology was built following the NeOn methodology (Gomez-Perez & Suárez-Figueroa, 2009), and following an iterative-incremental model for the continuous improvement and extension of the ontology through multiple iterations. NeOn identifies various scenarios for building ontologies and ontology networks. In particular, the following scenarios were used:
From specification to implementation, which comprises the core activities that have to be performed in any ontology development. Reusing and re-engineering non-ontological resources (NORs), which identifies relevant NORs, transforms them into ontologies and reuses them to build the target ontology. This is further described in section 3.3.1. Reusing ontological resources, which reuses existing ontological resources for building ontology networks. This is further described in section 3.3.2. Reusing ontology design patterns, which reuses Ontology Design Patterns (ODPs) to reduce modelling difficulties, to speed up the modelling process, or to check the adequacy of modelling decisions. Two main patterns were reused: (i) the SOSA, which is a revised and expanded version of the Stimulus Sensor Observation ODP
5
; (ii) the OWL and SKOS pattern to model different parts of the same conceptualisation side by side (Formal/Semi-Formal Hybrid – Part OWL, Part SKOS), as described.
6
In particular, this pattern was used for the code-list definitions, which are also in alignment with the ISO/IS 19150-2 (rules for developing ontologies in the OWL), and with the common practice of different standards.
7
For more detailed information, please refer to the GloSIS repository wiki. 8
The GloSIS domain model shall, as far as possible, support the general requirements listed below; these requirements have been gleaned from the various inputs received as well as the discussions to date. The requirements presented below have been defined in line with the principles of software engineering.
Re-use existing standardisation efforts to avoid developing a completely new model.
Re-use ANZSoilML as a reference to integrate relevant soil concepts. Re-use ISO 28258 as the base model. Integrate relevant soil concepts from the OGC Soil Interoperability Experiment. Integrate relevant soil concepts from the SOTER/ISRIC model. The resulting model should be simple and easy to use. Support the properties pertaining to soil body as defined in the UN FAO Guidelines for soil description in a generic way.
Design a generalised mechanism providing data users an insight with respect to what properties are available that pertain to a specific soil body.
code-lists/vocabularies (ontologies) shall be developed for linking the domain model with explicit soil body properties. Include code-lists/vocabularies (ontologies), but in a way that they can be added/modified/deleted without changing the domain model itself. AGROVOC terms should be used as a reference to avoid duplication of terms. The model shall specify the main “groups” of soil body properties according to the UN FAO guidelines for soil description. The model shall support the properties inventoried by the GSP in the report “Specifications for the Tier 1 and Tier 2 soil profile databases of the Global Soil Information System (GloSIS)” (Batjes et al., 2019). Determine which concepts (observed properties) should be considered as attributes (if any) and which should be modelled as observations (as access to measurement metadata may be required). The model shall include a concept to indicate the observed properties available on the soil features. A platform agnostic soil domain model, that is, abstract specification (in the terms of the Open Geospatial Consortium), should be elaborated to provide a common basis for all ongoing and future developments. Provide mappings between the newly developed model and all existing data-exchange models.
Finally, the model should provide the basis to allow the publication and harmonisation of soil-related data following the Linked Data principles, enabling the provision of an integrated view over various (previously disconnected) datasets. This, in addition to the requirements for creating and linking code-lists/vocabularies, the provision of mappings, and the reuse of existing standards, led to the development of the model in the form of an ontology.
The GloSIS domain model, initially realised as a UML model, was used as the basis to derive the target ontology. The model is composed of two main class types: the container classes, which are abstract classes used only for grouping observations (measurements) in a more readable manner, and spatial object types, which are the main GloSIS classes. The spatial object types are connected to the related observations via the connection with the container classes. Each of these two main types of classes was transformed and post-processed to generate the final ontology.
Based on the requirements described in Section 3.2, ISO 28258:2013 Soil quality – Digital exchange of soil-related data incl. Amd 1 (ISO 28258) was used to represent the top-level structure of the GloSIS web ontology. In order to better understand the steps taken for this task, one must first understand the basic structure of ISO 28258. At the most abstract level, the two core components of ISO 28258 pertain, on the one hand, to a set of spatial object types describing soil objects as well as artefacts generated by soil sampling, on the other hand, various observations or measurements of physicochemical properties on these objects. When extending this model for a specific usage area, one must determine if the information being extended is of a more static type, and thus should be appended to the spatial object type, or of a more dynamic nature, or also a value that can be determined via vastly different methodologies, and thus should be provided as an observation on the spatial object type.
The initial challenge in creating the GloSIS web ontology was identifying which spatial object types are required for the provision of the necessary information. Based on the GloSIS data requirements, the following spatial data types were identified: (i) site, (ii) plot, (iii) surface, (iv) sample, (v) specimen, (vi) profile, (vii) horizon, (viii) layer, and (ix) grid.
In a second step, the information requirements for each of these spatial object types were agreed upon with the experts, while the basis was provided by the FAO Guidelines for Soil Description (Jahn et al., 2006) and the GSP report “Specifications for the Tier 1 and Tier 2 soil profile databases of the Global Soil Information System” (Batjes et al., 2024). For this purpose, a spreadsheet was created with a row for every possible soil property, a column for each of the spatial object types. This matrix guided all further modelling work. Based on the understanding of the information requirements for each of these spatial object types, a decision had to be reached on how this information would be linked to the spatial object types. Based on the constraints laid down by ISO 28258, there were two main options available:
provide this information as an attribute of a specialised spatial object type; and provide this information as an O&M observation referencing a specialised spatial object type.
While the first option is simpler to implement, the second allows for far more flexibility and precision pertaining to the information content. This is of particular relevance in the GloSIS context, as the model must support a very heterogeneous data provider community; one cannot mandate how data is to be ascertained, instead being grateful that data is available at all. Thus, we believe that through the wide use of the O&M model, we can allow for well-structured provision of both data as we wish it to be, following the agreed methods and procedures, as well as other available data, whereby derivations from the agreed methods and procedures can be properly documented.
Once the GloSIS model was finalised and implemented as a UML model (as mentioned above), the final ontology was generated in two major steps: first the UML model was transformed into an OWL ontology, and then the output was aligned with SOSA/SSN and O&M. Based on the acquired knowledge and previous experience (e.g., FOODIE project), a semi-automatic transformation process was carried out with the help of the ShapeChange tool. 9 ShapeChange enables the generation of an ontology following the ISO/IS 19150-2 standard, which defines rules for mapping ISO geographic information from UML models to OWL ontologies.
The output ontology generated by ShapeChange provided a good starting point to produce the final GloSIS web ontology, but it required substantial post-processing tasks, as described in the following sections.
Reusing and Reengineering Non-Ontological Resources
The GloSIS UML model,
10
was released as an Enterprise Architect project.
11
The project had to be modified before a successful transformation using ShapeChange could be carried out. In particular, it was necessary to add an ApplicationSchema in the Stereotype of each package and assign the targetNamespace property to the GloSIS namespace:
The next step required providing missing DataTypes information manually, such as:
The primary mechanism for providing arguments to ShapeChange is the configuration file. The GloSIS implementation re-used the default configuration provided with ShapeChange for testing purposes.
13
The vanilla configuration file had to be adjusted for GloSIS transformation needs. Some of the most notable modifications included:
Removing inputs=“TRF” from Adjusting URIbase value. Adding source targetParameter. Adding namespaces of additional vocabularies used in the customised transformation rules, such as
14
: Introducing some additional mapping rules:
Introducing some new encoding rules.
Once the configuration was completed, the transformation was carried out by invoking the ShapeChange processor in the command line with the customised config file as an input.
The crude result of the transformation contained all container classes from the UML model (see Figure 2) represented as subclasses of Plot Spatial Object Type Overview, Where Green Boxes Refer to Container Classes, Dark Grey to ISO28258 Spatial Object Types and Light Grey to GloSIS Spatial Object Types (Full Size Diagram Available at: https://github.com/glosis-ld/glosis/wiki/Full-resolution-images).
After the transformation, the spatial object types were represented as subclasses of
SOSA/SSN is a lightweight but self-contained core ontology. It has already been used in GloSIS as the base model to represent observations. Nonetheless, various
The post-processing part required cleaning the ontology at first. Namely, removing container classes alongside the pointers between them and spatial object types. Secondly, the development of object properties while aligning them to SOSA/SSN, considering their data type. The latter was a complex task that is presented with regard to
There was considerably more variability with post-processing various observation types and measurements. All of them were represented as subclasses of
Moreover, they were restricted by constraining the various
The result restriction is represented differently depending on the type. The string is represented with
In the case of the result being an auxiliary class containing a code-list, the model would incorporate
Numerical results requiring restrictions such as units of measure (mostly those related to physico-chemical observations) leverage the QUDT ontology. In particular, sub-classes of
Each code-list is modelled using a class and a concept scheme. The concept scheme is defined as an individual of type
In order to facilitate the reuse, extension, and maintenance, code lists were modelled in a separate module.
If the result is a numerical value, the model uses the
Finally, the last restriction is linking the observation with the observed soil property, defined as an instance of
There are a few cases where
In those cases, the code-list for the observed soil property is created based on the same approach as the one presented for the result. The only difference is that the class representing the corresponding code-list is also defined as a subclass of
The transformation performed by ShapeChange resulted in spatial object types being represented only as subclasses of geosparql
Introduction of Procedure Code-Lists
A long-standing issue in the semantics of soil science is the conflation of soil properties and laboratory analysis concepts. Ad hoc soil datasets often commingle in a single item the soil property, the laboratory process used to assess it, and, on occasion, even the units of measure. The OGC SoilIE (OGC 16-088r1, 2016) identified this as a major hindrance to the correct exchange of soil information. Some of the soil properties inventoried in the GloSIS domain model yielded this problem.
In order to address this and further exemplify the rich use of the resulting GloSIS web ontology, a thorough inventory of physico-chemical analysis processes was gathered. The primary source of this inventory was the Africa Soil Profiles Database (Leenaars et al., 2014), with further insight gathered from the WoSIS database and procedures manual (Batjes et al., 2024). This information was gathered in a simple spreadsheet, adding also bibliographic references and existing online resources detailing each laboratory process.
A small transformation was created to produce a new module in the GloSIS web ontology from the spreadsheet described above, following on the framework applied with the ShapeChange transformation and making use of the SOSA/SSN and SKOS Web ontologies. Each laboratory process is expressed both as an instance of

Schematics of a GloSIS Observation. Blue: GloSIS Classes; Orange: External Classes; Yellow: GloSIS Instances.
Considering readability and having in mind the best software development practices (e.g., “Do not Repeat Yourself”), the ontology was implemented following a modular approach as a networked ontology, facilitating its reusability, extensibility, and maintainability. For instance, all code-lists were implemented within the “code-list” module and observations referenced across multiple modules were moved into a separate module called the “common module.” Additionally, as mentioned above, one of the most crucial aspects of post-processing was to align all the spatial object types with the ISO 28258 standard. That task was far from straightforward, since there is no existing ontology for this standard that could be used as a reference. Therefore, the “iso28258” module was created to introduce ISO features that were indispensable for connecting the GloSIS web ontology with an ISO 28258 standard. For this task, it was necessary to rely on the documentation of the standard. Additionally, this module includes alignment between elements in different ISO standards and other ontologies relevant to GloSIS. Some of these alignments include the definition of the following classes to be equivalent:
The GloSIS classes are connected to the “iso28258” module and other ISO classes through inheritance as depicted in Figure 4.

GloSIS Web Ontology – Connection Between Spatial Object Types and ISO 28258.
There are a few important notes that complement the depicted diagram. First,
The current version of the ontology consists of 12 modules. This modular approach allows for the introduction of new extensions and modules whenever they are needed. Contents of the ontology (release v1.5.1):
In line with best practices, the GloSIS web ontology has been implemented and released using persistent and resolvable identifiers, allowing access to the ontology on the Web via its URI and ensuring the sustainability of the ontology over time. In particular, the W3ID service for persistent identifiers has been used. The service supports content negotiation, for example, to retrieve an HTML page with the ontology documentation or the ontology source in some RDF serialisation format (e.g., Turtle and RDF/XML), depending on the client.
The base URI of the GloSIS web ontology is
Documentation
The various modules of the GloSIS web ontology are documented with a series of HTML pages automatically generated by the Wizard for Documenting Ontologies (WIDOCO; Garijo, 2017). Written in Java, this software is able to inspect a Web ontology and generate human-friendly documentation for all its classes, data types, and data properties, in a well-organised structure. The output documents apply internal HTML links to facilitate navigation among the different sections. It also integrates with WebVOWL (Lohmann et al., 2014) for automatic diagram generation.
WIDOCO is also able to extract some metadata from the ontology, in order to document its authorship, provenance, and licensing. However, it is not able to fully process predicates from the multiple metadata ontologies in use today (Doublin Core, VCard, Schema.org, etc.). Instead, WIDOCO makes available a configuration file in which metadata can be declared to be included at generation time. This configuration file contains important metadata such as authors, contributors, and their respective affiliations. Considering the number and varied nature of modules in the GloSIS web ontology, it was deemed impractical to maintain a WIDOCO configuration file for each. Such practice would lead to redundancy with the metadata triples already included in the ontology modules themselves.
A small program was developed to address the issue above. It inspects the metadata triples declared in an ontology module and then produces a specific configuration file for WIDOCO. This program, included in the GloSIS repository,
17
is able to identify various predicates from the Dublin Core Terms ontology, plus
This HTML documentation 18 is also accessible through the W3ID dereferencing mechanism (opening the base ontology URI from a web browser). Making use of content negotiation mappings, the user is presented with the HTML documentation when accessing GloSIS resources directly with a web browser. Otherwise, application access to GloSIS returns the ontology RDF documents.
Maintenance
GloSIS uses semantic versioning
19
to denote code changes. This means that version numbers have meanings. The goal is to communicate to the user what can be expected from the changes that were made. The general convention looks as follows:
Incrementing the
Finally, incrementing the
Besides versioning, GloSIS also has releases. Each release presents updated code that is usable and tested. The GloSIS repository does have a simple utility Python tool to update the version together with the version IRI for each module.
Furthermore, the GloSIS repository also includes two automation tools enabling the transformation from CSV files to OWL ontology and vice versa. These tools simplify the maintenance of code lists, which are available as CSV to enable experts to contribute more easily. For more information, please refer to the project repository wiki. 20
Applications of the Ontology
This section showcases the use of the GloSIS web ontology to represent and query some exemplary soil datasets. First, this section shows the applicability of the ontology by using it to publish widely known open datasets from Europe and beyond as Linked Data, which are publicly available via the FOODIE endpoint. 21 The generation and publication of the linked datasets was carried out using a Linked Data Pipelines tool, developed in the context of different projects (e.g., SIEUSOIL, DEMETER, and OPEN IACS), which enables the fetching, preparation, transformation, integration, and publication of linked data in a triplestore. 22 In short, the tool requires a mapping configuration file that specifies how the elements in the source dataset should be transformed to elements in the target ontology (in this case, GloSIS). For further information about the tool, please refer to its repository on GitHub. Next, this section presents some examples for data retrieval using SPARQL queries over data generated and stored based on the GloSIS web ontology. These queries show not only how to retrieve data from the original sources, but also how to exploit the linked data. Finally, this section introduces a semantic REST API that is built on top of the GloSIS web ontology and facilitates data exploration. This API allows for different applications to consume easily linked data, without the need to know SPARQL, RDF, and other semantic technologies.
LUCAS 2015 Topsoil Dataset
The LUCAS Programme is an area frame statistical survey organised and managed by Eurostat (the Statistical Office of the EU) to monitor changes in land use and land cover over time across the EU (Jones et al., 2020). Since 2006, Eurostat has carried out LUCAS surveys every three years. The surveys are based on the visual assessment of environmental and structural elements of the landscape in georeferenced control points. The points belong to the intersections of a
In 2015, the LUCAS survey was carried out in all EU-28 Member States. In total, 27,069 locations were selected for sampling. Samples were eventually collected from 23,902 locations, of which 22,631 were in the EU. Soil samples were collected from a depth of 20 cm following a common sampling procedure. After the removal of samples that could not be identified, the LUCAS 2015 Soil dataset has 21,859 unique records with soil and agro-environmental data.
The dataset includes the identification code
The following listings present one sample of the dataset represented according to the GloSIS web ontology. Listing 16 presents the Site instance and its geolocation, representing the location where the sample was collected.
Listing 17 presents the profile and profile element (layer) instance associated with the site.
Listing 18 presents an observation instance associated with the site.
Listing 19 presents two of the observation instances associated with the layer.
Soil Respiration Database (SRDB)
The Global SRDB is a compilation of field-measured soil respiration (RS, the soil-to-atmosphere CO
Each record in the database includes fields regarding the record metadata, site data, measurement data, annual and seasonal RS fluxes, and ancillary pools and fluxes. For this transformation, we used only a subset of the site data fields, including latitude, longitude, elevation, soil bulk density, sand ratio value, silt ratio value, and clay ratio value. The SRDB subset was transformed into linked data and is also available at the FOODIE endpoint, within the knowledge graph with the URI (http://w3id.org/glosis/open/srdb/). Note that the graph URI does not resolve; it is just the identifier of the graph in the triplestore. However, for the purpose of visualising the data, the Virtuoso triplestore faceted browser 26 can be used, for example, to display SRDB observations regarding soil type. 27
The following listings present one sample record of the SRDB dataset represented according to the GloSIS web ontology. Listing 16 presents the site instance and its geolocation, representing the location of the sample.
Listing 17 presents the profile and profile element (layer) instance associated with the site.
Listing 22 presents a few observation instances associated with the soil layer.
The WoSIS RDF Service
The WoSIS is the result of a decade’s effort towards a harmonised soil observation dataset at the global scale (Batjes et al., 2024). WoSIS has at its core a relational database containing information on more than 200,000 geo-referenced soil profiles, originating from 180 different countries. The number of individual soil horizons characterised in this database borders on 900,000, for which almost 6 million individual observation results are recorded. Source datasets are subject to a process of rigorous quality control and harmonisation in order to be added, resulting in a globally consistent dataset, directed at digital soil mapping and environmental application at large scales.
A pilot was conducted to set up a GloSIS-compliant RDF service with WoSIS as a data source. This pilot considered, in the first place, ontological alignment. The WoSIS data model follows a substantially different pattern from those found in soil ontologies (vide Section 2). For instance, WoSIS does not sport an entity ontologically similar to the
These mappings were encoded in the external schema of the WoSIS relational database as a set of views. These views also perform a transformation to RDF, producing triples expressed in the Turtle language. Listing 23 provides a snippet of one of these views, creating instances of the
Metadata was added with predicates from the Dublin Core, VCard, and DCat web ontologies.
A set of triples produced by these RDF transformation views was deployed to the Virtuoso triplestore, accessible through a SPARQL endpoint 28 and the Virtuoso Faceted Browser. 29 This pilot RDF service showcases the transformation of a traditional soil observation dataset into a GloSIS-compliant knowledge graph. It exemplifies the geo-location of soil profiles with GeoSPARQL, their composition with soil horizons and respective characterisation with observations of physico-chemical properties.
Data Discovery and Access
This section presents two different approaches to discover and access data represented according to the GloSIS web ontology (from the examples presented in the previous sections). First, the section introduces a set of exemplary SPARQL/GeoSPARQL queries that provide guidance on the interaction with a triplestore serving GloSIS-compliant linked data. Then, the section presents an example REST API that allows simplified programmatic access to such data, abstracting all the details on how data is represented or how to interact with semantic data via SPARQL queries.
A key advantage of producing and publishing GloSIS-compliant linked data is the possibility to access soil-related data from different sources in an integrated manner, as well as to discover and establish links between them, and with other relevant open datasets available in the Linked Open Data (LOD) cloud, for example, FADN, NUTS, AGROVOC, etc.
SPARQL Queries
The GloSIS repository wiki includes four exemplary queries, which can be tried out against the LUCAS dataset described in Section 4.1.
The first query
30
retrieves the average value for the total nitrogen soil property in the top soil of a certain spatial area. Starting from the
The second query
31
exemplifies the benefits of linked data, and the rich axiomatisation of the GloSIS web ontology. The query retrieves the average value for the pH soil property, measured using a specific procedure in the topsoil of a certain NUTS region. Similar to the previous query, it starts by retrieving the values of PH observations (
The third query 32 exemplifies the benefits of code lists and semantic inferencing. The query retrieves the total number of survey points (from LUCAS) over land use with specific type/supertype (e.g., PRIMARY SECTOR) that have nitrogen total higher than a certain threshold (e.g, 2). The query leverages the taxonomic relationships in the code list for land use (used in LUCAS) to retrieve observations with land use type at any level under the one specified by the user.
Finally, the fourth query 33 exemplifies even further the benefits of linked data, and particularly how the GloSIS web ontology provides the basis to enable an integrated access to multiple soil data sources available in different triplestores. The federated query retrieves NitrogenTotal observations, which have a value over the specified threshold, from two different endpoints (FOODIE and ISRIC), and returns them in an integrated result set.
Semantic REST API
Although the native language to access the RDF data generated based on the model is SPARQL, in order to facilitate the access and consumption of data by potential services/applications, a REST API is created. The REST API returns simple JSON data, which is one of the most popular formats used by Web services to produce/consume data. The API is implemented using GRLC 34 that translates SPARQL queries stored in a Git repository 35 to a REST API on the fly.
Hence, using as a starting point, the SPARQL from the previous section, we created the following API methods:
Ontological Extensions
As it stands, the ontology currently spans soil data exchange in the same breadth as previous initiatives. The focus rests primarily with soil investigations conducted on the field, including the collection of physical samples later to be analysed with wet chemistry methods in a laboratory. There are, though, advancements in the domain that beg for consideration in a soil data ontology.
Modern instruments allow the collection of high-resolution reflectance spectra from soil samples, an activity known as soil proximal sensing. From these spectra, estimates of physico-chemical properties can be obtained by statistical models, with relatively high accuracy (Viscarra Rossel et al., 2011). Soil spectroscopy instruments are also becoming increasingly relevant in field work, avoiding expensive activities of sample transport and laboratory analysis (Chang et al., 2001). The SOSA ontology already contains assets (such as the
Another field under active research is the estimation and inventory of measurement uncertainty. Such information is traditionally absent from soil data sources, even though uncertainties stemming from field work and laboratory procedures are known to be relevant (Libohova et al., 2019). In downstream activities relying heavily on soil data, such as digital soil mapping, and further into decision support, measurement uncertainty is capital in conveying an accurate characterisation and fidelity of resulting products. Since neither O&M nor SOSA consider measurement uncertainty, this remains an open field of research.
Finally, a note on soil classification systems. The GloSIS web ontology proposes a completely liberal approach, providing simple text data properties without supporting controlled content. The user can therefore use any classification system and even combine various systems. While there are merits to this approach, an alternative pattern with controlled content can be argued for. The World Resource Base of soil resources (WRB) would be the obvious choice for such content, as the only soil classification/description system developed for the world as a whole. However, the WRB system poses its own set of challenges. On overage, it is updated every 5 years, without backwards compatibility. Therefore, a soil classified as Vertisol in the 2015 edition might be in a different class in the 2014 edition, yet another still in the 2007 edition and so forth. The INSPIRE Soil Theme opted for the 2007 edition of the WRB (currently legally binding), essentially deterring classification with later versions. In order for a system such as the WRB to be adopted as controlled content, a different evolution paradigm is necessary, taking into account the requirements of digital data exchange. Engagement with the WRB work group of the IUSS towards this end is indispensable.
Operational Improvements
A future goal is to use the transformer tool as a component in continuous integration (CI) and continuous delivery (CD). That would allow us to automatically re-generate and deploy a new version of the ontology each time a change to the code-lists or procedures is recorded in the supporting spreadsheets. This future improvement can also include automation of other modules, which would allow making changes to the whole ontology content by contributors not familiar with RDF languages.
Also facilitating the use of the ontology is the setup of an online browsing service. This can be particularly worthwhile for the use of code-lists, which are somewhat extensive. Since code-lists are encoded with SKOS, some obvious options open in this regard. SKOSMOS (Suominen et al., 2015) is a web application for the publication of controlled vocabularies based on SKOS, providing powerful navigation functionalities. An alternative is the ONKI web service (Tuominen et al., 2009), a large platform that allows free upload of SKOS-based vocabularies. ONKI automatically provides APIs and web widgets for the resources uploaded.
Human Factors and Education
The GloSIS web ontology is one further step in a long lineage of soil ontologies. While it presents clear advances in content and format (not the least by embracing the Semantic Web), by themselves, these do not guarantee its complete success. Previous efforts did not always manage to fully engage the soil data provision community, and those that did so were invariably legally enforced. It is therefore capital to keep human factors of ontology use in consideration.
The CI/CD mechanism described above is one step in that direction, by facilitating the dialogue between computer scientists and soil scientists (likely unfamiliar with the innards of the Semantic Web). Providing a simple file format mirroring the actual ontology can be critical to engage and involve domain experts.
To further facilitate engagement with the wider community of soil scientists and soil data provision institutions, the establishment of an “Ontology Steering Committee” can be decisive. This body could mirror the governance paradigm employed in Open Source projects (German, 2003; Riehle, 2011), an assembly of computer scientists and soil scientists collectively guiding ontology development. The actual structure and rules of such a body are beyond the scope of this manuscript; however, other concepts from the Open Source community, such as “Request For Change” (Canfora & Cerulo, 2005), can provide the necessary templates. Towards this end, engagement with organisations such as the Soil Standards Working Group of the IUSS, or the Soil Ontology and Informatics Cluster of ESIP 36 can be paramount
Daniel (2019) points to ontology as one of the remaining gaps in data science research and education. Its absence is understood to compromise most stages of the research process, starting with data collection and on to the rigour of the outcome. However, ontologies and the Semantic Web in general have already been applied in the educational context to a large swathe of domains (Jensen, 2019). The introduction of soil ontology to soil science and soil data curricula appears therefore, as a natural development. With its extensive code-lists and standards-based lineage, GloSIS is a strong candidate for practical application in education. Such development would not only render the use of ontologies commonplace, but also train a new generation of soil scientists themselves capable of evolving ontology in their domain.
Glossary
Footnotes
Acknowledgements
The work in this paper has been supported by and partially carried out in the scope of the SIEUSOIL and EJP SOIL projects and by ISRIC – World Soil Information. EJP SOIL and SIEUSOIL has received funding from the European Union’s Horizon 2020 research and innovation programme. The EJP SOIL Grant agreement No is 862695, the SIEUSOIL Grant agreement No is 818346. ISRIC – World Soil Information supports the soil community with soil, soil data, soil data exchange standard development to support soil data, information and knowledge provisioning at global, national and sub-national levels for application into sustainable management of soil and land.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work in this paper has been supported by and partially carried out in the scope of the SIEUSOIL and EJP SOIL projects and by ISRIC – World Soil Information. EJP SOIL and SIEUSOIL have received funding from the European Union’s Horizon 2020 research and innovation programme. The EJP SOIL Grant agreement No is 862695, the SIEUSOIL Grant agreement No is 818346. ISRIC – World Soil Information supports the soil community with soil, soil data, soil data exchange standard development to support soil data, information and knowledge provisioning at global, national and sub-national levels for application into sustainable management of soil and land.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
