Abstract
This paper explains conceptual modeling within the framework of Frame-Based Terminology (Faber, 2012; 2015; 2022), as applied to EcoLexicon (ecolexicon.ugr.es), a specialized knowledge base on the environment (León-Araúz, Reimerink &, Faber, 2019; Faber & León-Araúz, 2021). It describes how a frame-based terminological resource is currently being restructured and reengineered as an initial step towards its formalization and subsequent transformation into an ontology. It also explains how the information in EcoLexicon can be integrated in environmental ontologies such as ENVO (Buttigieg, Morrison, Smith, Mungall & Lewis, 2013; Buttigieg, Pafilis, Lewis, Schildhauer, Walls & Mungall, 2016), particularly at the bottom tiers of the Ontology Learning Layer Cake (Cimiano, 2006; Cimiano, Maedche, Staab & Volker, 2009). The assumption is that frames, as a conceptual modeling tool, and information extracted from corpora can be used to represent the conceptual structure of a specialized domain.
Introduction
There is a clear need for explicit models of semantic information (e.g., terminologies) to facilitate information exchange. One approach is through ontologies, regarded as shared models of a specialized domain that encode a view common to a set of users. The close link between terminologies and ontologies has been widely acknowledged (Gillam, Tariq & Ahmad, 2005; L’Homme & Bernier-Colborne, 2012; Roche, 2012; Montiel-Ponsoda, 2022). Terminologies and ontologies are similar because both entail the conceptualization of a specialized subject field. The difference between them lies in the fact that terminologies are generally developed for knowledge acquisition by human users, whereas ontologies are built for knowledge sharing between human and artificial agents.
The standard definition of ontology is ‘a formal, explicit specification of a shared conceptualization’ (Studer, Benjamins & Fensel, 1998, based on Gruber, 1995). This specification takes the form of the definitions of representational vocabulary (classes, relations, etc.), which provide meanings for the vocabulary as well as the constraints on its use. As such, an ontology is a formal representation of domain knowledge in which concepts are hierarchically organized. It is more formal than a terminology since it possesses a set of axioms and logic-based rules to structure, organize, and verify the consistency of the hierarchy through reasoning and inference.
In contrast, a terminology is the result of terminology work, which is defined as the “systematic collection, description, processing and presentation of concepts and their designations” (ISO 1087: 2019, 3.5.1) in a certain subject field. This collection of terms can take various forms. It can be a flat alphabetical list or a set of term records with data fields. However, it can also be a more conceptually oriented resource (i.e., terminological knowledge bases) in which the concepts are structured, based on their meaning, and organized in a set of labelled domains and subdomains. Despite its lack of ontological formality, a terminology can be the first step towards creating a shared conceptualization of a specialized domain.
Terminologies contribute to ontology building because they are often used as a starting point for formalization. In fact, most specialized fields possess terminology for naming, classifying, and standardizing the concepts in them (Sowa, 2000). According to Prieto-Díaz (2003), a taxonomy of terms highlights the key concepts in a field, which can then be defined and related to create an ontology. This may involve the reuse and reengineering of non-ontological resources whose semantics have not yet been formalized in an ontology (Suárez-Figueroa, 2010).
The reuse of resources can also be based on the extraction of information from a corpus of domain-specific texts and terminographic resources as well as expert validation rather than elicitation. Conceptual representations can thus be extracted from natural language texts by identifying lexico-syntactic structures (e.g., knowledge-rich contexts and knowledge patterns) that can be directly mapped or translated into ontology structures (Cimiano & Wenderoth, 2005; 2007; Montiel-Ponsoda & Aguado de Cea, 2014; Montiel-Ponsoda, 2022).
The reengineering of terminological resources may also involve transforming the resource schema into an ontology schema and converting the content into ontology instances. To achieve this, the implicit semantics of the original resource should be made explicit in the target ontology, which generally means that hyperonymy-hyponymy or meronymy relations are translated into subclass-of and part-of relations, respectively (Montiel-Ponsoda, 2022). However, this is easier to accomplish if the terminological resource is more knowledge-oriented and has a more conceptual design to facilitate its reuse and potential conversion into an ontology. This involves a process of conceptual modeling, which has always been one of the major objectives in terminology management (Budin, 1994; Meyer & Mackintosh, 1996; Meyer, Mackintosh, Barrière & Morgan, 1997; Faber, León-Araúz, Prieto-Velasco & Reimerink, 2007, Faber & L’Homme, 2022, inter alia).
Conceptual modeling in terminology
Conceptual modeling is used both in Terminology and Ontology building as well as in many other areas related to knowledge representation and data modeling. Although each discipline approaches it with varying degrees of formalization, it can be broadly defined as the systematic process of abstracting a model from the real world entities that it seeks to represent. In multilingual terminology management, concept systems are a common modeling device because they allow the aggregation of multilingual equivalents of terms and monolingual terms that are synonyms into a common concept object (Shreve, 1995). This is an acknowledgement of the concept-oriented approach of knowledge resources. Concept systems in Terminology can also be useful to accomplish the following: (1) model concepts and their relations within a subject field; (2) clarify the relations between concepts; (3) lay the foundations of a uniform and standardized terminology; (4) facilitate the design of definitions; and (5) accommodate all relevant concepts in a terminographic resource (based on the ISO standard 704: 2009).
Given the importance of concept systems in specialized language, it is thus surprising that there are relatively few terminology resources that make conceptual structures explicit (Roche, Costa & Carvalho, 2019). An effective knowledge base should not only describe concepts individually but should also specify how each concept is related to others so that users can make inferences regarding characteristics and affordances. This is not only a question of representing single concept and term entries, but also of integrating them into larger knowledge structures or frames to map the relations that they hold with others and thus capture their different conceptual dimensions.
Even though terminological resources store a great deal of conceptual information, quite often, their design does not allow conceptual relations to be easily perceived and extracted. Before a terminological database can begin to approximate an ontology, the knowledge that it contains must be made explicit. For that reason, a resource can be more easily used or even ‘reengineered’ by ontology builders when it has a conceptual design. In that case, it can provide data at the basic levels of an ontology that are directly relevant to ontology building.
Ontology building and terminology
According to Cimiano (2006) and Cimiano, Maedche, Staab and Volker (2009), ontology building follows a model known as the Ontology Learning Layer Cake (OLC). Even though the OLC has been harshly criticized (Browarnik & Maimon, 2015), it is still widely used, since it provides a coherent and structured way to understand and organize the different steps involved in the process of building ontologies. Moreover, it is seamlessly aligned with Terminology, as all of its layers correspond with the different steps involved in terminology management.
As shown in Fig. 1, from the bottom to the top layer, the tiers of the layer cake involve the extraction or specification of a certain type of information for each ontology component: (i) terms; (ii) synonyms in the same languages and equivalents in different languages; (iii) concepts; (iv) concept hierarchy; (v) concept relations; (vi) axiom and rules.

Cimiano’s (2006) ontology learning layer cake.
Ontology development is largely based on the definitions of concepts and the relations between concepts (Wróblewska, Podsiadły-Marczykowska, Bembenik, Protaziuk, & Rybiński, 2012). Terms (and their synonyms) are the linguistic designation of concepts. Concepts differ from terms in that they are ontological entities and thus abstractions of human thought. However, the structure of definitions as well as the different explicit relational structures found in corpus texts can provide valuable information that indicates what terms mean and how concepts relate to each other. This is useful because an important part of ontology building is establishing a concept hierarchy based on vertical relations (i.e., is-a relation or part-of relation) and horizontal ones (e.g., causes, result-of, located-at, takes-place-at, etc.). Much of this information is reflected in language.
The reengineering of a well-modeled terminological knowledge base can thus be a valuable shortcut that greatly facilitates ontology building since terms and their synonyms are already ascribed to a conceptually organized structure. The following sections discuss Frame-based Terminology, and how EcoLexicon, its practical application, is gradually being reengineered with a view to transforming it into an ontology.
The rest of this paper is structured as follows. Section 2 discusses the basic premises of Frame-based Terminology, and its relation to Frame Semantics; Section 3 describes EcoLexicon, the practical application of FBT and explains the process of conceptual modeling currently being applied to approximate EcoLexicon to an ontology, such as ENVO. Section 4 presents the conclusions and outlines future challenges.
Frame-Based Terminology (FBT) is a cognitive approach to Terminology that focuses on the contextualized representation of specialized concepts, and which can contribute to ontology building by supplying the information needed for the Ontology Learning Layer Cake. As its name implies, Frame-based Terminology uses premises of Frame Semantics (Fillmore, 1976; 1982; 2006), and situated cognition (Barsalou, 2003; 2008; 2009) to structure specialized domains and create non-language-specific representations (Faber, 2022). The assumption is that language reflects thought, and that the non-language-specific frames for specialized concepts can be extracted from texts.
A frame is an organized package of knowledge that humans retrieve from long-term memory to make sense of the world. In fact, framing experience involves applying stored knowledge derived from similar contexts and situations with a view to understanding complex events and how to deal with them. It is evident that concepts do not exist in a vacuum and are more meaningful when they are related to each other and integrated into progressively more complex knowledge configurations.
Frames are thus crucial to both general and specialized knowledge though the emphasis here is on the terms that designate specialized knowledge concepts. In Environmental Science and in science in general, there is a great need for the consistent descriptions of entities, processes, and features. As highlighted by Löbner (2015, p. 37), frames and functional concepts play a central role in scientific thinking since sciences deal with classes of objects, such as physical objects, living organisms, chemical substances, etc. Frames also underlie the conception of scientific classifications (i.e., taxonomies), as well as of types of processes such as chemical reactions.
The assumption that frames are present in texts is the foundational premise of Frame Semantics (FS), based on Fillmore’s (1976; 1982; 2006) Case Grammar. One of its premises is that all concepts are part of a larger structure (semantic frame) and are related in such a way that the activation of one word evokes the entire frame. FS explains how meanings are structured and associated with words in a semantic structure and how these provide access to our conceptual system, which is the inventory of structured knowledge that we use to navigate the world (Evans & Green, 2006). Frame Semantics is also used to refer to a wide variety of approaches to the systematic description of natural language meanings since it relates linguistic utterances to world knowledge, such as event types and their participants.
The practical application of FS is the FrameNet database for general language (Ruppenhofer, Ellsworth, Petruck, Johnson & Scheffczyk, 2010; Baker, Fillmore & Cronin, 2003), in which a frame is regarded as a conceptual structure that describes a situation, object, or event along with its participants. The goal is to encode scenarios and show how they can be described linguistically with the lexical units (LUs) that evoke them and the grammatical structures that provide details about the participants.
In Terminology and specialized language, Frame Semantics, and FrameNet have had a significant impact. For example, BioFrameNet (Dolbey, Ellsworth & Scheffczyk, 2006) extended the FrameNet lexical database to the domain of molecular biology and examined the syntactic and semantic combinatorial possibilities of the lexical items used in this field to better understand the grammatical properties of specialized language. Schmidt (2009) applied Frame Semantics to multilingual terms in soccer. In the field of Medicine, Verdaguer (2020) used Frame Semantics to analyze the Health Science Corpus (SciE-Lex lexical database) and highlight the common syntactic and semantic features of biomedical terms, motivate their combinatorial patterns, and establish frame-based semantic networks.
Representations, such as those in FrameNet, are also the basis of DiCoEnviro (L’Homme, 2018), an online environmental resource with terms in various languages (e.g., English, French, Spanish, Portuguese, etc.). DiCoEnviro describes terms as lexical units rather than as labels for concepts. Relations between terms are manually encoded by terminologists using lexical functions, LFs (Mel’čuk, Clas & Polguère 1995; Polguère, 2012). DiCoEnviro entries provide information about the linguistic properties of each environmental term along with its argument structure and contextual annotations. These entries are then connected to a knowledge resource called Framed DiCoEnviro (L’Homme, Robichaud & Prévil 2018; L’Homme, Robichaud & Subirats-Rüggeberg, 2020), where they are linked to the frames evoked. DiCoEnviro frames either come directly from FrameNet (for English) or are intuitively created.
EcoLexicon (ecolexicon.ugr.es) is also a terminological knowledge base about environmental science (Faber et al., 2016; León-Araúz et al., 2019; Faber & León-Araúz, 2021). It differs from the previously mentioned resources because of its conceptual design, which is derived from information semi-automatically extracted from specialized texts as well as from the structure of terminological definitions (Faber & León-Araúz, 2021). It thus can provide the information for various tiers of the Ontology Learning Layer Cake (see Fig. 1). As the practical application of Frame-based Terminology, its conceptual modeling and structure are discussed in Section 3.
EcoLexicon
The EcoLexicon knowledge base (ecolexicon.ugr.es) was created in 2003 by the LexiCon research group as part of the PuertoTerm research project. It was (and is) a cooperative endeavor between the LexiCon Research group and the Andalusian Inter-University Institute for Earth System Research (IISTA-CEAMA). The resource was originally based on a core list of 794 environmental terms in Spanish and English, collated by the scientists and engineers in the IISTA-CEAMA.
As part of the project, definitions were elaborated for each term, which reflected the level of generality or specificity of the concept as well as its relations with other concepts within the same knowledge domain. Definitions were constructed so that all concepts in the same category followed the same pattern.
In parallel, a corpus of English and Spanish environmental texts was also compiled. Subsequent corpus analysis enriched the initial inventory of terms and detected other related concepts. Thanks to a series of funded research projects (e.g., MarcoCosta, PuertoTerm, ReCord, ConTent, TOTEM, etc.) concepts were gradually organized in semantic categories and structured in concept systems. The original list of terms was enriched by the addition of more terms as well as by its transformation into a conceptual network.
Over the last 20 years, EcoLexicon has grown exponentially and now contains 4654 environmental concepts and 24968 terms in eight languages (English, Spanish, German, French, Dutch, Modern Greek, Russian, and Arabic). Thanks to ThinkMap technology for data visualization, the general view of EcoLexicon is the gateway to a rich inventory of information for each concept, namely its definition, relations to other concepts, graphical representations, and correspondences in other languages. (Faber et al., 2016; Faber & León-Araúz, 2021). Figure 2 shows the main view of the EcoLexicon entry for

EcoLexicon main view: entry for
When users click on any other concepts in the semantic network, the concept system rearranges itself (i.e., some concepts disappear and some others emerge, based on the new search concept). By right-clicking on a concept in the network, the user can access a contextual menu. This menu can be used to perform any of the following actions: (1) centering the concept; (2) fixing a node by dragging it to a certain position; (3) visualizing details of the concept (definition, associated terms, resources, etc.) by selection on the sidebar; (4) generating a URL for direct access to the concept selected; (5) searching Google Images, Google, and Wolfram Alpha; (6) removing a concept and its related concepts from the network; (7) expanding a node to include other hierarchical levels. Any of these actions enhances concept representation by providing a large quantity of conceptual information, depending on the specific needs of the user. Users can also establish the depth of the concept system, namely, its maximum hierarchical level.
However, the information in this view did not automatically generate itself all at once. Instead, it is the result of a long process of conceptual modeling that has gradually evolved over the years.
As previously mentioned, the emphasis on concept systems in Terminology stems from the premise that specialized knowledge acquisition is enhanced when concepts are organized so that the relations between them are made explicit (Budin, 1994; Meyer, Eck & Skuce, 1997). This facilitates the activation of associative information in semantic memory, thus promoting context availability. The basic premise is that new knowledge is more meaningful when it is related to previous knowledge. Consequently, for concepts to become a part of one’s knowledge and be retained in long-term semantic memory, they must be embedded within a knowledge structure (Faber, 2011; 2012; León-Araúz & Faber, 2012).
Initially, EcoLexicon was structured in the form of the Environmental Event. This frame had the advantage of being process-based. The categories were the roles that the entities had within the general event (see Fig. 3).

Environmental event.
However, this more informal frame-like structure of EcoLexicon, which represents environmental actions and processes, needed to be complemented with a more formal top-down organization of semantic classes in order to determine degrees of specificity and conceptual similarity (Hahn & Chater, 1997). In 2017, certain premises of ontology building were thus adopted. Accordingly, the concepts in EcoLexicon were classified in 152 semantic categories distributed in five categorization levels (Gil-Berrozpe & Faber, 2017; Gil-Berrozpe, León-Araúz & Faber, 2017; 2018; 2019). The inventory of semantic classes was extracted from concept definitions and corpus information.
The most general level of the hierarchy is composed of the three starter ontological categories (Mahesh & Nirenburg, 1995; Moreno-Ortiz & Pérez-Hernández, 2000) (Table 1).
Starter ontological categories
The specification of these categories is ongoing as more concepts are added to EcoLexicon. Concepts with a multidimensional nature are classified in as many categorization hierarchies as necessary. Attributes were divided into 16 categories, entities into 93, and processes into 43.
The inclusion of these conceptual categories in EcoLexicon allows users to browse concepts based on their conceptual category, but it also allows terminologists to conceive definitional templates and establishes the groundwork for establishing ontological classes in concept hierarchies. Users can query concepts as to their category by clicking on Conceptual Categories at the bottom left of the screen.
For example,
Movement: categories

Conceptual categories in EcoLexicon:
As reflected in the conceptual network, the concepts in EcoLexicon are linked by an inventory of conceptual relations, which thus far consist of the following:
Vertical relations: type-of, part-of, made-of, phase-of, delimited-by, located-at
Horizontal relations: attribute-of, result-of, affects, causes, takes-place-at, has-function, measured, represents, studies.
The corresponding inverse relation is reflected in the directionality of the arrows that link each concept to another. For example, a selection of the semantic relations codified for
Selection of conceptual relations for
Within the context of
type_of: this generic-specific relation reflects hierarchical inheritance in concept systems. All entities and events can be categorized as instances of a particular class and hierarchical chains can be built accordingly. For example,
part_of: this relation also reflects the hierarchical structure of the domain. In the case of physical entities, this relation directly refers to parts of each concept, whether concrete or abstract (
located_at: this relation is relevant when the location of a physical entity is an essential characteristic for its description. For example, an
takes_place_at: this relation describes processes which have spatial and temporal dimensions. For example,
causes: this relation only links entities and events, for example,
affects: this relation, along with causes and result_of is a crucial relation in dynamic systems since environmental concepts have a high combinatorial potential. Affects relates a wide variety of concepts to their ever-changing environments. It links processes or entities that cause a change in any other entity or event without producing a final result (e.g.,
result_of: this relation is relevant to either events or entities that are derived from other processes or events. For example,
To represent more specific types of earthquake, the following artificial ‘umbrella’ concepts are also used: (i)
As shown in Fig. 5, more specific types of
Specific types of
Within the category of

The various types of earthquake were extracted from the EcoLexicon corpus as well as from other specialized resources such as termbases, dictionaries, and glossaries. In this type of conceptual modeling, definitions are of paramount importance combined with corpus analysis.
Definitions are crafted in conjunction with concept systems in Terminology since concept systems lay the groundwork for an internally coherent system and avoid inconsistencies. At the micro-semantic level, a definition is the linguistic description of the characteristics of a concept. According to Antia (2000, pp. 113–115), a definition fixes a concept, describes it, and also links it to others. As one of the most important components of any high-quality terminological resource, definitions are thus a privileged medium for knowledge representation as they are a direct natural language explanation of a concept. In this sense, definitions and their format provide the frame for the other types of information.
Definitions also have a central role in the use of ontologies (Seppälä 2015). Ontological definitions are singular noun phrases which are content words that form part of a domain-specific vocabulary used by a group of experts to communicate about entities to which the terms refer (Seppälä, Ruttenberg & Smith, 2017, p. 75). In ontologies, according to Seppälä, Ruttenberg, Schreiber and Smith (2016), a good definition delimits the intended meaning of an ontology term by describing the instances of the type to which the term refers. It states that the Xs are of the type Y and are distinguished from other instances of this type by some collection Z of one or more characteristic marks. As is well-known, this is also an Aristotelian definition, which is the typical format of most terminological definitions, composed of a generic or superordinate term and differentiating characteristics (Eck and Meyer, 1995, pp. 83–87; Sager, 1990, p. 42).
Definitions in EcoLexicon are also based on templates that make category membership explicit. These definitions reflect a concept’s relations with other concepts and specify essential characteristics (León-Araúz & Faber, 2012, pp. 153–154). The majority of EcoLexicon definitions conform to the guidelines for definitions specified in Seppälä et al. (2017), and those that do not are currently under revision. Evidently, it is necessary for definitions to have a uniform structure that directly refers to the underlying conceptual structure of the domain, as represented in the concept systems. For example, the EcoLexicon definition of earthquake is the following:
The conceptual relations (in brackets) and the structures pointing to them (in bold) are the following:
geological phenomenon involving a sudden oscillatory
generally
which
These waves shake the ground,
and can
Other
As previously mentioned,
For example, earthquake subtypes based on origin of movement, namely, where the stress is released, focus on the conceptual relation takes_place_at. This is the case of

This hierarchy in Fig. 6 is reflected in the structure of the definitions of the concepts since, each subordinate concept is defined in terms of the more generic one, immediately preceding it, as shown in Fig. 7.

Tectonic earthquake hierarchy in EcoLexicon.
The same thing occurs in the hierarchy of

EcoLexicon hierarchy for
The hierarchy in Fig. 8 was extracted from the following interrelated set of definitions:
Depth of movement.
earthquake that occurs at depths of less than 70 kilometers beneath the Earth’s surface.
earthquake that occurs at depths ranging from 70 to 300 kilometers beneath the Earth’s surface. Less frequent than shallow focus earthquakes but more common than deep focus earthquakes, intermediate focus earthquakes typically occur in subduction zones, where one tectonic plate is forced beneath another.
earthquake that occurs at depths of more than 300 kilometers and are primarily associated with subduction zones, where the subducting plate sinks deep into the Earth’s mantle.
As shown in Fig. 9, there are also terms that focus on earthquakes, depending on their time of occurrence in the earthquake event: (i)

EcoLexicon hierarchy for
The hierarchy in Fig. 9 was extracted from the following set of interrelated definitions:
Time of movement.
less powerful earthquake that occurs before a stronger one and that usually originates in the same place as the main earthquake that it precedes.
strongest earthquake in a sequence, sometimes preceded by one or more foreshocks, and almost always followed by many aftershocks.
less powerful earthquake that follows a stronger one and that usually originates in the same place as the main earthquake that it follows.
Since the terms for
In FBT, semantic relations are extracted from corpus texts in different languages through the use of knowledge patterns (KPs), which are the lexico-syntactic structures that encode semantic relations in natural language. Pattern-based approaches assume that there are recurrent and predictable linguistic cues that indicate specific types of information, and which can be used to locate and extract text excerpts and the knowledge they convey (Meyer, Mackintosh, Barrière & Morgan, 1997, p. 257; Marshman, 2022, p. 292). Text excerpts that qualify as knowledge-rich contexts (KRCs) indicate at least one item of domain knowledge that could be useful for conceptual analysis (Meyer 2001, p. 281), which means that they contain at least one KP making the relation between two concepts explicit.
Table 4 shows a brief selection of the KPs used to link concepts in EcoLexicon.
Examples of semantic relations and knowledge patterns
Examples of semantic relations and knowledge patterns

Concordances for tectonic earthquake.
These KPs are used to query the corpus and thus detect terms that are related to each other. In this case, important data regarding earthquakes is what causes them because that is information that can be used to divide earthquakes into different classes. As shown in Fig. 10, the concordances provide valuable clues to propositions that can be included in the semantic network and/or definition of
Based on the information in these concordances regarding the cause relation, it is possible to extract the propositions shown in Table 5.
Although this is an effective way of extracting semantically related pairs of concepts from corpora, it can be extremely time-consuming. To speed this process up and make it semi-automatic, we developed the EcoLexicon Semantic Sketch Grammar (ESSG) (León-Araúz & San Martín, 2018; León-Araúz, San Martín & Faber, 2016) in order to retrieve high-density knowledge rich contexts (León-Araúz & Reimerink, 2019). Based on the combination of KPs and part-of-speech tags in the form of regular expressions, the ESSG was created to extract concept pairs related by the following conceptual relations: generic-specific, part-whole, location, cause, and function. We were thus able to apply it to our corpus with the corpus querying tool Sketch Engine (Kilgarriff, Rychlý, Smrz & Tugwell, 2004; Kilgarriff, Baisa, Bušta, Jakubíček, Kovář, Michelfeit, Rychlý & Suchomel, 2014).
The ESSG is made up of more than 200 sketch rules. These rules, together with the statistical features of the word sketch functionality, allow us to extract concept pairs while controlling the process, which still relies on manual verification. For example, Table 6 shows one of the rules used to retrieve hyponymic structures such as the following:
Propositions for the cause relation
Example of a rule used to retrieve hyponymic structures
As can be observed in the examples and the items of the rule, recursivity and optional elements allow the detection of varying forms of the same KP. When all rules in the grammar are applied, different concept pairs are extracted in the form of a word sketch, from which different KRCs with different KPs (is a common, such as, or other, and other, including, typically) can be accessed (Fig. 11).

Word sketch of mineral and KRCs.
It goes without saying that the information retrieved with the ESSG still requires manual verification to eliminate false positives, such as the one shown in the word sketch of mineral (e.g. mineral is the generic of mineral because of multi-word terms). When ESSG was evaluated (León-Araúz & San Martín, 2018), causes of false positives were POS-tagger mistakes, polysemous keywords, polysemous KPs, concepts expressed as long periphrastic clauses, general language words instead of specialized terms, etc. Although some of these causes are currently being addressed in a new version of the ESSG, others are inherent limitations of rule-based knowledge extraction. Nevertheless, this is still a useful tool for terminology work, since it accelerates the process and, in any case, all concept pairs must be verified, not only because of false positives, but also because natural language does not codify neat taxonomies. Terms are very often found related to others at different degrees of granularity. For instance, dolphins, ruminants and mammals do not refer to sibling concepts despite the fact that they are children of animal in the corpus (Fig. 12). It is thus the terminologist who needs to situate concepts at the right level in the hierarchy.

Hyponymic concordances of animal.
As shown in this paper, EcoLexicon is gradually evolving and is now organized in a hierarchy of conceptual categories. Although it has not as yet become an ontology, it has significantly gained in formality. Its structural changes, based on a class hierarchy instead of a formal event, signify that it includes valuable data, which could be used in formal ontologies and contribute to the approach taken by the Ontology Layer Cake (OLC), particularly at the first five levels (terms, synonyms in one or more languages, concepts, concept hierarchies and conceptual relations). Since all concepts in EcoLexicon have now been organized in a domain-based hierarchy, this facilitates the matching and comparison of common information across different environmental terminology resources.
For instance, some of this information in EcoLexicon could be integrated into the Environmental Ontology (ENVO) (
Figure 13 shows the ENVO entry for

Entry for
The entry in Fig. 13 indicates that
Furthermore, the ENVO conceptualization of
Secondly, the ENVO entry only includes eight subclasses of
Thirdly, even though foreshocks, mainshocks, and aftershocks are typically associated with tectonic earthquakes, they are also associated with other types of seismic event, such as volcanic eruptions or human-induced earthquakes. Whereas it is true that tectonic earthquakes are the most common cause of foreshocks, mainshocks, and aftershocks, these phenomena can occur in various seismic contexts.
Fourthly, in the context of earthquakes, “multiplet tectonic earthquake” is not a widely recognized term. A more frequent term is “multiplet earthquake”, which refers to a series of earthquakes that occur for the same general reason and share characteristics such as location, focal mechanism, and temporal pattern. However, these earthquakes are not necessarily tectonic.
Finally, blind thrust earthquakes and megathrust earthquakes would best be described as subtypes of
This paper has compared terminologies and ontologies and discusses how both can learn from each other. Of the many different types of terminologies, the resources that can best contribute to ontology building are conceptually structured. Unfortunately, such resources are not easy to find. Although most terminologies are treasure troves of data regarding concepts and relations, their design does not always make this information explicit. Even terminology knowledge bases, such as EcoLexicon, need to be significantly reengineered in order to approximate the formal structure of an ontology.
This process is currently taking place in EcoLexicon. In an effort to provide the knowledge base with a more formal structure, its concepts have been assigned to classes within a conceptual hierarchy. This hierarchy currently co-exists with the Environmental Event. These conceptual classes as well as relations to other concepts were extracted from definitions as well as from a large corpus of specialized environmental texts.
Definitional templates were thus created for categories as a way of providing definitions with a more formal structure. Information was also extracted from the EcoLexicon (private) corpus, which currently has a total of 104,964,907 words in English. Since this corpus is subdivided into various knowledge domains, we queried the Geology subcorpus, which has a total of 12,085,810 words. The data obtained from these queries were used to structure categories, create concept frames, and characterize general processes and actions.
The concept of
Footnotes
Acknowledgements
This research was carried out in the framework of the projects, PID2020-118369GB-I00), funded by the Spanish Ministry of Science and Innovation, and A-HUM-600-UGR20), funded by the European Regional Development Fund (ERDF).
