Abstract
Thesauri are popular, as they represent a manageable compromise – they are well-understood by domain experts, yet formal enough to boost use cases like semantic search. Still, as the thesauri size and complexity grow in a domain, proper tracking of the concept references to their definitions in normative documents, interlinking concepts defined in different documents, and keeping all the concepts semantically consistent and ready for subsequent conceptual modeling, is difficult and requires adequate tool support. We present TermIt, a web-based thesauri manager aimed at supporting the creation of thesauri based on decrees, directives, standards, and other normative documents. In addition to common editing capabilities, TermIt offers term extraction from documents, including a web document annotation browser plug-in, tracking term definitions in documents, term quality and ontological correctness checking, community discussions over term meanings, and seamless interlinking of concepts across different thesauri. We also show that TermIt features better fit the E-government scenarios in the Czech Republic than other tools. Additionally, we present the feasibility of TermIt for these scenarios by preliminary user experience evaluation.
Introduction
Terminological ambiguities across normative documents cause misunderstandings and misinterpretations. For example, according to the Prague Building Regulations1?> (PBR), a
Both MPP and PBR were published by the Institute of Planning and Development of Prague3 (IPR). While PBR is an obligatory Prague directive, MPP is still under preparation and its draft is public.4 Yet, its terminological ambiguities contributed to the lengthy public discussions and revisions.
Both the MPP document and the web application use common-sense terminology, as well as normative terminology referring to existing documents, like the Building Act No. 183/2006 Coll., Decree No. 501/2006 Coll, or Decree No. 268/2009 Coll. However, a closer look at some of the key concepts reveals that it is not just the term “building” with conflicting definitions5 across MPP, PBR, and other related normative documents, as exemplified in Fig. 1.

Terminology ambiguities –
These problems deteriorate understandability of both MPP and PBR. MPP drafts have been facing many objections by individuals, professional associations as well as municipalities from the beginning in the year 2013 – many of them related to ambiguous terminology.6 Thesauri offer a solution here – they are well-understood by domain experts, yet formal enough to support word sense disambiguation, and other NLP techniques for semantic search.
Furthermore, subsequent
The normative documents, as well as the derived artifacts (web application/open data sets) do not provide the user with the explicit context of the concepts, their definition, and relation to other concepts,
A term denotes different concepts (meanings). While sometimes a concept definition is just a more restricted variant of another (building with/without subterranean parts), sometimes a term denotes two fundamentally different concepts (construction as an object vs. construction as an event). The two different meanings often come from different contexts (documents/thesauri).
The connection between a concept and the place where it occurs or is defined is not maintained.
To address these issues, as a part of our research collaboration with IPR, we designed and implemented TermIt [14], a tool for managing interconnected thesauri extracted from decrees, directives, standards, and other normative documents. TermIt offers a scenario additionally supporting linking the identified concepts to their referential occurrences and definitions in normative documents, interlinking concepts coming from different thesauri, community discussions over concept meaning, as well as concept quality validation.
An overview of TermIt has been already given in [14]. The current paper focuses on the practical impact of TermIt and presents the following contributions:
concept validation and quality checking using the Unified Foundational Ontology [9] (Section 3.3),
browser plug-in for web document annotation (Section 3.2),
comparison of TermIt features to state-of-the-art systems and evaluation of its usability (Section 5),
a report of the impact of TermIt and its practical use cases (Section 6).
In Section 2, we present key models used in TermIt. Section 3 shows the architecture and features of the system. Section 4 presents existing tools which are in Section 5 compared to TermIt, together with a user study on TermIt usability. The impact of TermIt is discussed in Section 6 and the paper is concluded in Section 7.
In [13] a structure of interlinked Semantic Government Vocabularies based on legal acts is proposed, without offering software support for their creation and management. Each vocabulary consists of a thesaurus represented in the Simple Knowledge Organization System (SKOS) [15] and an ontological model expressed in the Unified Foundational Ontology (UFO) [9]. To align with the proposed architecture used in the Czech eGovernment, TermIt is designed to be compliant with SKOS. Furthermore, to ensure semantic coherency of the concepts and to prepare them for the subsequent conceptual modeling, it allows classifying concepts into UFO categories.
SKOS
Simple Knowledge Organization System (SKOS) [15] has been used for more than a decade as a standard for representing simple thesauri on the web. Its main features involve representing
The SKOS specification does not provide support for distinguishing between different ontological categories, like endurants (

Example SKOS representation of a building
Unified Foundational Ontology (UFO) [9] is a top-level ontology designed for conceptual modeling, originally expressed in modal logic [9], but with recent partial translations to OWL 2 [1].
UFO covers static properties of endurants (UFO-A) [9], dynamic properties of perdurants (UFO-B) [11], a multi-level theory for modeling types (UFO-MLT) [6], and other extensions.
The basic
TermIt architecture
TermIt9 is a Web application with backend written in Java10 and front-end written in TypeScript.11 The back-end follows a
TermIt is backed by a GraphDB [2] repository that uses custom rules allowing inference combining selected RDFS [3] (class and property hierarchies) and OWL [16] (inverse properties) features. A linked data API can be set up on top of the repository using Pubby [7].
TermIt deployment can be done via Docker12 as well as by running individual services directly on the underlying host. Installation as well as requirements are described on TermIt web pages.13
TermIt14 consists of several modules for thesauri creation, verification of their quality, and their usage for document annotation and search. Figure 2 illustrates the system architecture and the following sections provide details.

Schematic depiction of the architecture of TermIt. Annotace [17] is an external document annotation service. TermIt Annotate is a browser plugin [5] for direct web annotation. Solid edges (with labels indicating protocol and data format) represent data flow. Oval nodes within the TermIt box represent the main use cases/components with arrows indicating their interdependence.
In the thesauri management module, users can maintain domain thesauri (called
Document annotation and web annotation
Linking documents and concepts is a key feature of TermIt. Users can attach documents to thesauri, either by uploading an HTML file through TermIt UI, or by annotating a web page directly using the TermIt Annotate plug-in [5]. Then, occurrences of concepts and their definitions in the text can be linked to the thesauri concepts. In this manner, documents and web pages linked to the given concept can be listed. TermIt uses Annotace [17], a standalone text analysis service, to discover concept occurrences in documents and suggest new concepts based on their significance in the text. As the service is based on the Czech language for which a gold standard is not available yet, Annotace is only used for concept suggestions – a domain expert has to always review and confirm them.
Quality checking and validation
To control the quality and completeness of concepts, TermIt employs constraints encoded as SHACL [12] rules that are evaluated over the SKOS [15] representation of a thesaurus. The constraints include, for example:
each concept has a
each concept has at most one
each concept has at least one
a concept has a
a concept should have a
a concept should have a parent (either
Although the primary purpose of TermIt is thesauri management, P2 (see Section 1) requires different concepts to be properly distinguished. To support this disambiguation, TermIt supports classification of concepts using UFO. Thus, SHACL constraints are also used to check the semantic coherency of a concept typed with a UFO type, e.g.:
each concept has at least one UFO semantic category ( The above categories are pairwise disjoint.
The full set of constraints (14 at present) is configurable per TermIt deployment based on the selection from the SGoV validator.16 Currently, TermIt does not support choosing the set of constraints dynamically, or per thesaurus.
Rule violations are presented to the user by the TermIt user interface (UI). Additionally, rules g4, g13, g14, and m1 form a
Various tools addressing the problems P1–P3 were investigated, which we described in a survey report.17
For P1, tools that are able to express concepts, i.e. the meaning of terms can be utilized. Terminology management tools like Vocbench 318 often targeted to domain experts or language translators, use textual definitions and relate concepts with thesauri relations such as
For P2, various techniques are used by tools to differentiate the meaning of identical or similar terms or phrases. The quality of concept labels, textual definitions and structural properties of taxonomies is checked by tools like PoolParty Semantic Suite.21 SKOS Shuttle22 identifies and proposes fixes for orphaned concepts. Sketch Engine23 identifies different concepts by analyzing a corpus and presenting contextual information for them. Lastly, tools like Menthor Editor utilize upper-level ontology theories, employing typing by stereotypes developed within those theories to differentiate the meaning of similar terms.
For P3, connecting concepts to locations within documents where they are defined or used is a challenge. Terminology management tools like PoolParty Semantic Suite or SKOS Shuttle, computer-assisted translation (CAT) tools like SDL Trados,24 and NLP-related tools for text annotations like GATE25 or BRAT26 work with extracted text, not the documents themselves, so they do not keep track of those locations. Web annotating tools like Hypothesis27 or Diigo28 can annotate parts of the text within documents and navigate there, but they lack support for concept management or concept search within the document.
In summary, we have not found any existing tool to effectively address all three problems P1–P3. The challenge in combining existing tools is due to P3, where tools either operate solely with extracted text and do not work with actual documents or lack support for concept management and concept search within the document. This presents a significant obstacle to solving these problems simultaneously.
Evaluation
For evaluation, we have identified five functional features based on the use cases mentioned in Section 1 browsing and searching for thesauri and concepts, creating/editing new thesauri and concepts, interlinking concepts, handling documents and web resources, thesauri quality control, annotation of resources and their content.
Referring to Section 1, features F1 and F2 reflect general thesauri management capabilities and are linked to P1, feature F4 addresses P2, and features F3 and F5 handle P3. We compare the tools w.r.t. features F1–F5 and test their satisfaction in TermIt by a user experience evaluation.
To compare the tools, we used the review from Section 4. After disregarding tools that do not support document/web source management (e.g. Vocbench 3) or do not consider textual content (e.g. VoCol29), we picked four tools most relevant to our use cases – the PoolParty Semantic Suite (PP), TopBraid Enterprise Data Governance (EDG), SKOS Shuttle (SKS), and TermIt. Table 1 highlights key differences among the tools w.r.t. to features F1–F5.
F1 is well-supported in SKS and has excellent support in other tools. All the tools support hierarchical visualization of concepts, search within key metadata of concepts (like labels, synonyms, or definitions), and thesauri (like description). Search for non-label metadata can be done in SKS only using SPARQL queries. In addition, EDG and PP implement faceted search and advanced search filters.
F2 support is diverse across the tools. Although they all support multi-user access, change tracking, multilingual concepts, and their linking across thesauri, only TermIt can link a concept to the place where it is defined. PP and EDG support customizable collaboration workflows, while TermIt only supports the draft/confirmed status of a concept. PP, EDG, and TermIt provide a discussion thread about a concept.
F3 is well-supported in all tools, including managing local files or URL resources. In addition, the competitors support the web crawling of documents. PP and EDG also support full-text searches within documents.
F4 is well supported in EDG and PP, featuring customizable validation rules, metrics, and the generation of reports. F4 is also well supported in TermIt, offering validation rules and metrics for concepts and thesauri that are configured during deployment. SKS has support only for concept deorphanization.
F5. TermIt competitors allow importing different content types (e.g., PDF, XLSX, HTML) but do not support manual annotations. Only PP can visualize annotated content as plain text. In contrast, TermIt allows both online annotations in the web browser and importing HTML documents supporting manual annotations. Complex HTML documents often need to be adjusted manually before import.
Differences among the tools w.r.t. features F1–F5. Commonalities of the tools are described in Section 5.1
Differences among the tools w.r.t. features F1–F5. Commonalities of the tools are described in Section 5.1
To sum up, EDG and PP are professional tools that provide many additional features that are more or less relevant for the presented use case. PP is intuitive and provides the best support in F1 and F4. EDG provides similar features as PP, yet, with a more complex UI. SKS is a very promising alternative to PP but lacks F4. F5 has the weakest support across the tools except TermIt. TermIt allows creating or approving suggested concepts directly from the document without the need to leave the document view. Moreover, the user can navigate from a concept to its definition within a document to get the proper context for understanding the concept.
We have set up a set of tasks for the features F1–F5, each evaluated w.r.t. the following criteria:
the time needed to finish the task,
the understanding of the task,
the understanding of the TermIt content,
the importance of tested features,
the difficulty of the tasks, and
the error detection.
These criteria reflect the user experience with TermIt – the goal is not to find out what the users are doing, but how they understand the content, the features and what is the subjective difficulty of task completion in TermIt.
Testing scenarios30 were applied to five users with different levels of experience with TermIt. They were recruited from the people working as domain experts for urban planning, aviation, and medicine and also developers, none of which participated in the development of TermIt. The testers know normative texts regarding their domain very well and understand the meaning of words in context. The testing scenario was divided into five specific tasks.
As a response to the feedback, a new web browser plugin (TermIt Annotate) has been designed, allowing testers to create the concepts directly by annotating web resources in the browser, without the need of (re-)importing web documents to TermIt and modifying their appearance. Out of five testers, three responded they would favor it over the internal TermIt annotator, one is not sure, and the last one would not. Overall usability of the web annotator plugin scored high in the evaluation (average score 4.6 out of 5). More details can be found in [5].
Table 2 shows average and limit times needed to finish the tasks. T2 was misinterpreted by the testers, making the time measure irrelevant. T5 focused on understanding the concept rather than efficiency of the relevant actions.
Times needed to finish particular tasks in minutes
In summary, less experienced users needed more time to finish the tasks. Also, users found it difficult to create definitions from the text. With the browser plugin, however, the testing showed its feasibility. Another feature difficult to understand was concept quality. On the other hand, it took only very little explanation for the users to finally understand it – in T2, users did not understand what it meant, but in T4, everyone was able to fix the errors. Interlinking and recognition of concepts raised some minor issues caused by the UI.
While TermIt has been designed and tested in the Metropolitan plan scenario introduced in Section 1, it has been actively developed since then and used in other setups. Several projects in the aviation31 and healthcare industry32 used TermIt for research purposes, two other TermIt uses then go beyond the academic sphere.33
The mentioned scenarios have common characteristics that show the expected setup for further deployments: Each of the scenarios is backed by a professional community responsible for maintaining/publishing data that historically suffer from terminological ambiguities and misinterpretations. Each community uses a dedicated TermIt instance to maintain a set of interlinked thesauri, where each thesaurus represents a normative document that the community maintains, or refers to. Each community opens the final terminology to the public, to help interpret produced documents and data.
IPR uses TermIt on a commercial basis to systematize the terminology of the Digital Technical Maps across the Czech Republic, based on Decree No.393/2020 Coll, with technical support provided by the Czech Technical University in Prague. The community of contributors who are using TermIt to create or comment on the terminology has been steadily growing (approx. 80 as of July 2022). Within the past two years, we organized several online webinars for the community (public administration and commercial professionals in urban planning, architects, and civil engineering) to become familiar with TermIt and the basics of knowledge modeling. The released thesauri versions are then used by a terminology viewer developed at IPR to present the resulting uniform terminology of the Digital Technical Maps to the general public.34
eGovernment
The second usage of TermIt goes beyond the Czech Technical University in Prague. TermIt has been adopted by the Department of the eGovernment Chief Architect of the Ministry of Interior of the Czech Republic (MI) within the EU-funded project No. CZ.03.4.74/0.0/0.0/15_025/0013983. It has been used here as a part of the Assembly line35 – a larger ecosystem of tools supporting the creation of conceptual models of public administration agendas. For example, TermIt has been used for managing the terminology of the eGovernment thesaurus,36 a set of concepts used for modeling the information architecture of the eGovernment of the Czech Republic, covering information architecture standards, like The Open Group Architecture Framework,37 or various Czech eGovernment-related laws, and decrees. Over the past two years, four training sessions for more than 150 attendees in total were organized, each including a dedicated session on TermIt usage and applicability.
Conclusions
We presented TermIt, an open-source (GPL 3.0) tool for managing contextual (typically document-based) thesauri, interlinking their concepts, and connecting the concepts to their occurrences and definitions in documents. Especially the latter seems to be rather neglected in the existing open-source solutions, which is however crucial to maintaining the link between a normative document and a thesauri concept.
The usability testing proved the suitability of our approach, yet revealed problems that made some tasks difficult to complete for non-trained TermIt users – e.g. the concept quality scenario (T2) revealed that concept identification (e.g. unique concept definition/label) was not understood well by our users. On the other hand, linking a concept to its defining occurrence in the document was well understood and doable using the web annotation TermIt plugin.
In the future, we plan to test TermIt in the eGovernment domain for the specific task of managing interlinked thesauri related to core Czech legislation and governmental data management as well as to standards and other normative documents. TermIt’s focus is narrow, primarily targeting P1–P3 challenges. However, we also foresee the potential to integrate features from existing tools (like role-based governance) and assess TermIt applicability in broader terminology management contexts. Furthermore, we plan to extend support for validation constraint management and extraction of formal concept definitions from the textual ones.
Footnotes
Acknowledgements
The work was supported by grant No. SGS19/110/OHK3/2T/13 Efficient Vocabularies Management Using Ontologies of the Czech Technical University in Prague.
Note that we consider here only explicit definitions. Although legal acts are at the top of both MPP and PBR, the community authoring these documents consists mainly of urban planning professionals (not lawyers), who focus only on the terminological consistency of the documentation and derived data.
An overview of the first round of objections in 2013, that refers to many terminology-related problems can be found at
Prefixes
This paper refers to the current version – 2.16.3
sources at
An application containerization service –
To access a demo instance, install the Google Chrome Extension from
This rule motivates users to properly ground the concepts in the foundational ontology UFO. The only element for which this rule would fail would be the top-level
An English version available at
Improving effectiveness of aircraft maintenance planning and execution (CK01000204,
Strategic management of the development of electronic healthcare in the Ministry of Health (CZ.03.4.74/0.0/0.0/15_025/0006212,
Details on the list of use-cases can be found at
For example, the concept ‘Building’ can be seen in
draft version of the terminology is published at
